dta <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
dta <- dta[, c("gre","gpa")]400位學生的GRE與GPA成績前6筆資料。
head(dta) gre gpa
1 380 3.61
2 660 3.67
3 800 4.00
4 640 3.19
5 520 2.93
6 760 3.00
400位學生GRE(X軸)與GPA(Y軸)的散佈圖
plot(dta, type = 'p', xlab = "GPA分數", ylab = "GRE分數")
grid()\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i ,~~ \epsilon_i \sim N(0, \sigma^2) \]
GRE = 截距參數 + 斜率參數 x GPA + 殘差(常態分佈)
小數點4位,去掉星星.
options(digits = 4, show.signif.stars = FALSE)
summary(m0 <- lm(gre ~ gpa, data = dta))
Call:
lm(formula = gre ~ gpa, data = dta)
Residuals:
Min 1Q Median 3Q Max
-302.39 -62.79 -2.21 68.51 283.44
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 192.3 47.9 4.01 7.2e-05
gpa 116.6 14.0 8.30 1.6e-15
Residual standard error: 107 on 398 degrees of freedom
Multiple R-squared: 0.148, Adjusted R-squared: 0.146
F-statistic: 68.9 on 1 and 398 DF, p-value: 1.6e-15
根據這份數據, 平均GPA多出1分時, 平均GRE大約增加116.6分(誤差為14.0). 殘差估計為\(\hat{\sigma} = 107\).
anova(m0)Analysis of Variance Table
Response: gre
Df Sum Sq Mean Sq F value Pr(>F)
gpa 1 786185 786185 69 1.6e-15
Residuals 398 4538099 11402
plot(dta, xlab = "GPA分數", ylab = "GRE分數")
abline(m0, lty = 2)
grid()檢查殘差分配有沒有規律
plot(resid(m0) ~ fitted(m0), xlab = "Fitted values",
ylab = "Residuals")
grid()
abline(h = 0, lty = 2)qqnorm(resid(m0))
qqline(resid(m0))
grid()顯示演練單元信息
sessionInfo()R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950
[2] LC_CTYPE=Chinese (Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
[5] tools_3.4.2 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.13
[9] stringi_1.1.5 rmarkdown_1.8 knitr_1.20 stringr_1.2.0
[13] digest_0.6.12 evaluate_0.10.1