dta <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
dta <- dta[, c("gpa", "gre")]
head(dta)
## gpa gre
## 1 3.61 380
## 2 3.67 660
## 3 4.00 800
## 4 3.19 640
## 5 2.93 520
## 6 3.00 760
下面R程式碼畫出GPA and GRE數據集的散點圖.
plot(dta, type = 'p', xlab = "GPA", ylab = "GRE")
grid()
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i ,~~ \epsilon_i \sim N(0, \sigma^2) \] GRE = 截距參數 + 斜率參數 x GPA + 殘差(常態分佈)
小數點4位,去掉星星.
options(digits = 4, show.signif.stars = FALSE)
summary(m0 <- lm(gre ~ gpa, data = dta))
##
## Call:
## lm(formula = gre ~ gpa, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -302.39 -62.79 -2.21 68.51 283.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 192.3 47.9 4.01 7.2e-05
## gpa 116.6 14.0 8.30 1.6e-15
##
## Residual standard error: 107 on 398 degrees of freedom
## Multiple R-squared: 0.148, Adjusted R-squared: 0.146
## F-statistic: 68.9 on 1 and 398 DF, p-value: 1.6e-15
根據本份資料,學生每增加1分的GPA,能夠增加117分的GRE。
anova(m0)
## Analysis of Variance Table
##
## Response: gre
## Df Sum Sq Mean Sq F value Pr(>F)
## gpa 1 786185 786185 69 1.6e-15
## Residuals 398 4538099 11402
plot(dta, type = "p", xlab = "GPA", ylab = "GRE")
abline(m0, lty = 2)
grid()
檢查殘差分配有沒有規律
plot(resid(m0) ~ fitted(m0), xlab = "Fitted values",
ylab = "Residuals", ylim = c(-3.5, 3.5))
grid()
abline(h = 0, lty = 2)
qqnorm(resid(m0))
qqline(resid(m0))
grid()