dta <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
dta <- dta[, c("gre", "gpa")]
400位學生GRE(X軸)與GPA(Y軸)的散佈圖
plot(dta, type = "p", xlab = "GRE分數", ylab = "GPA分數")
grid()
\[y_i = _0 + _1 x_i + _i ,~~ _iN(0, ^2) \] GPA=截距參數+斜率參數×GRE+殘差
小數點4位,去掉星星。
summary(m0 <- lm(gpa ~ gre, data = dta))
##
## Call:
## lm(formula = gpa ~ gre, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.08675 -0.22435 -0.00015 0.24809 0.76176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6458978 0.0913100 28.977 < 2e-16 ***
## gre 0.0012660 0.0001525 8.304 1.6e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3518 on 398 degrees of freedom
## Multiple R-squared: 0.1477, Adjusted R-squared: 0.1455
## F-statistic: 68.95 on 1 and 398 DF, p-value: 1.596e-15
根據本份資料,學生每增加1分的GRE,能夠增加0.0012分的GPA(誤差為0.0001)。殘差估計為\(\hat{\sigma}\)為0.352。
plot(dta, type = "p", xlab = "GRE分數", ylab = "GPA分數")
abline(m0, lty = 2)
grid()
檢查殘差分配有沒有規律
plot(resid(m0) ~ fitted(m0), xlab = "Fitted values",
ylab = "Residuals", ylim = c(-1.5, 1.5))
abline(h = 0, lty = 2)
grid()
qqnorm(resid(m0))
qqline(resid(m0))
grid()