Straight Line Model

part a)

\((x1, y1)\)

fit1 <- lm(y1 ~ x1, data = anscombe)
summary(fit1)

## 
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92127 -0.45577 -0.04136  0.70941  1.83882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0001     1.1247   2.667  0.02573 * 
## x1            0.5001     0.1179   4.241  0.00217 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6295 
## F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217

\((x2, y2)\)

fit2 <- lm(y2 ~ x2, data = anscombe)
summary(fit1)

## 
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92127 -0.45577 -0.04136  0.70941  1.83882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0001     1.1247   2.667  0.02573 * 
## x1            0.5001     0.1179   4.241  0.00217 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6295 
## F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217

\((x3, y3)\)

fit3 <- lm(y3 ~ x3, data = anscombe)
summary(fit1)

## 
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92127 -0.45577 -0.04136  0.70941  1.83882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0001     1.1247   2.667  0.02573 * 
## x1            0.5001     0.1179   4.241  0.00217 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6295 
## F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217

\((x4, y4)\)

fit4 <- lm(y4 ~ x4, data = anscombe)
summary(fit1)

## 
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92127 -0.45577 -0.04136  0.70941  1.83882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0001     1.1247   2.667  0.02573 * 
## x1            0.5001     0.1179   4.241  0.00217 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6295 
## F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217

part b)

part i)

Looking at the estimate for \(\beta_0\), they are all approximately \(3.0\). The standard error of all the models also alike: \(1.12\). As well as, the p-value \(H_0: \beta_0=0\), which is approximately \(0.025\), which means there’s evidence against \(H_0\). So, the four models are consistent

part ii)

The four models are consistent with the estimate for \(\beta_1\), with a value of around \(0.50\). The standard error is also similar, approximately \(0.118\). the p-value for \(H_0: \beta_1=0\) is also similar with \(0.002\), indicating strong evidence against \(H_0\).Therefore, the four model is quite consistent.

part iii)

All four models have an \(Multiple\) \(R-Squared\) statistic of approximately \(0.665\) This means that approximately \(66.5%\) of total variation in the observed responses.

part iv)

All four models have a \(Residual\) \(standard\) \(error\) of \(1.237\) on \(9\) degress freedom, indicating that all models have similar variability in the residuals.

part c)

plot(anscombe$y1 ~ anscombe$x1,
     xlab="x1",
     ylab='y1',
     main="Anscombe's (x1, y1) Pair",
     cex.lab=1.5)
# adding the least squares fitted line
coefs <- lm(y1 ~ x1, data = anscombe)$coefficients
abline(a=coefs[1], b=coefs[2],
       col='red', lwd=2)

plot(anscombe$y2 ~ anscombe$x2,
     xlab="x2",
     ylab='y2',
     main="Anscombe's (x2, y2) Pair",
     cex.lab=1.5)
# adding the least squares fitted line
coefs <- lm(y2 ~ x2, data = anscombe)$coefficients
abline(a=coefs[1], b=coefs[2],
       col='red', lwd=2)

plot(anscombe$y3 ~ anscombe$x3,
     xlab="x3",
     ylab='y3',
     main="Anscombe's (x3, y3) Pair",
     cex.lab=1.5)
# adding the least squares fitted line
coefs <- lm(y3 ~ x3, data = anscombe)$coefficients
abline(a=coefs[1], b=coefs[2],
       col='red', lwd=2)

plot(anscombe$y4 ~ anscombe$x4,
     xlab="x4",
     ylab='y4',
     main="Anscombe's (x4, y4) Pair",
     cex.lab=1.5)
# adding the least squares fitted line
coefs <- lm(y4 ~ x4, data = anscombe)$coefficients
abline(a=coefs[1], b=coefs[2],
       col='red', lwd=2)

part e)

I conclude that it’s not enough to rely solely on the statistical analysis based only on the fitted model, since in most cases data are more complex, and can’t be described by a simple linear model. In this Anascombe dataset, it showed how four models that have similar statistical analysis, actually looks very different. Therefore, we have to pair statistical analysis, and visual analysis.