dta <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
dta <- dta[, c("gre", "gpa")]
head(dta)
## gre gpa
## 1 380 3.61
## 2 660 3.67
## 3 800 4.00
## 4 640 3.19
## 5 520 2.93
## 6 760 3.00
400位學生GRE(X軸)與GPA(Y軸)的散佈圖
plot(dta, type = "p", xlab = "GRE分數", ylab = "GPA分數")
grid()
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i ,~~ \epsilon_i \sim N(0, \sigma^2) \] GPA=截距參數+斜率參數×GRE+殘差
小數點4位,去掉星星.
options(digits = 4, show.signif.stars = FALSE)
summary(m0 <- lm(gpa ~ gre, data = dta))
##
## Call:
## lm(formula = gpa ~ gre, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0867 -0.2244 -0.0002 0.2481 0.7618
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.645898 0.091310 29.0 < 2e-16
## gre 0.001266 0.000152 8.3 1.6e-15
##
## Residual standard error: 0.352 on 398 degrees of freedom
## Multiple R-squared: 0.148, Adjusted R-squared: 0.146
## F-statistic: 68.9 on 1 and 398 DF, p-value: 1.6e-15
根據本份資料,學生每增加1分的GRE,能夠增加0.0012分的GPA(誤差為0.0001)。 殘差的估計(σ^)為0.352。
anova(m0)
## Analysis of Variance Table
##
## Response: gpa
## Df Sum Sq Mean Sq F value Pr(>F)
## gre 1 8.5 8.53 69 1.6e-15
## Residuals 398 49.3 0.12
plot(dta, type = "p", xlab = "GRE分數", ylab = "GPA分數")
abline(m0, lty = 2)
grid()
檢查殘差分配有沒有規律
plot(resid(m0) ~ fitted(m0),
xlab = "Fitted values",
ylab = "Residuals",
ylim = c(-1.5, 1.5))
abline(h = 0, lty = 2)
grid()
qqnorm(resid(m0))
qqline(resid(m0))
grid()