讀入數據

前六位學生的gpa和gre.

dta <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
dta <- dta[, c("gpa", "gre")]


head(dta)
   gpa gre
1 3.61 380
2 3.67 660
3 4.00 800
4 3.19 640
5 2.93 520
6 3.00 760

基本統計圖形

下面R程式碼畫出dta數據集的散點圖.

plot(dta, type = 'p', xlab = "gpa", ylab = "gre")
grid()

線性模型分析

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i ,~~ \epsilon_i \sim N(0, \sigma^2) \]

gre = 截距參數 + 斜率參數 x gpa + 殘差(常態分佈)

分析概要報表

小數點4位,去掉星星.

options(digits = 4, show.signif.stars = FALSE)
summary(m0 <- lm(gre ~ gpa, data = dta))

Call:
lm(formula = gre ~ gpa, data = dta)

Residuals:
    Min      1Q  Median      3Q     Max 
-302.39  -62.79   -2.21   68.51  283.44 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    192.3       47.9    4.01  7.2e-05
gpa            116.6       14.0    8.30  1.6e-15

Residual standard error: 107 on 398 degrees of freedom
Multiple R-squared:  0.148, Adjusted R-squared:  0.146 
F-statistic: 68.9 on 1 and 398 DF,  p-value: 1.6e-15

根據這份數據, 這四百位學生中,平均gpa多出1.0 時, 平均gre大約增加116.6(誤差為14.0). 殘差估計為\(\hat{\sigma} = 0.148\).

方差分析表

anova(m0)
Analysis of Variance Table

Response: gre
           Df  Sum Sq Mean Sq F value  Pr(>F)
gpa         1  786185  786185      69 1.6e-15
Residuals 398 4538099   11402                

模型擬合圖

plot(dta, xlab = "gpa", ylab = "gre")
abline(m0, lty = 2)
grid()

殘差圖

檢查殘差分配有沒有規律

plot(resid(m0) ~ fitted(m0), xlab = "Fitted values", 
     ylab = "Residuals", ylim = c(-120, 120))
grid()
abline(h = 0, lty = 2)

驗證殘差常態分佈

qqnorm(resid(m0))
qqline(resid(m0))
grid()

結束

顯示演練單元信息

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 
[2] LC_CTYPE=Chinese (Traditional)_Taiwan.950   
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
[4] LC_NUMERIC=C                                
[5] LC_TIME=Chinese (Traditional)_Taiwan.950    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mosaic_1.1.1      Matrix_1.2-12     mosaicData_0.16.0 ggformula_0.6.2  
[5] ggplot2_2.2.1     lattice_0.20-35   dplyr_0.7.4      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15     pillar_1.2.1     compiler_3.4.3   plyr_1.8.4      
 [5] bindr_0.1        tools_3.4.3      digest_0.6.15    nlme_3.1-131    
 [9] evaluate_0.10.1  tibble_1.4.2     gtable_0.2.0     pkgconfig_2.0.1 
[13] rlang_0.2.0      psych_1.7.8      parallel_3.4.3   yaml_2.1.17     
[17] ggdendro_0.1-20  bindrcpp_0.2     gridExtra_2.3    stringr_1.3.0   
[21] knitr_1.20       rprojroot_1.3-2  grid_3.4.3       mosaicCore_0.4.2
[25] glue_1.2.0       R6_2.2.2         foreign_0.8-69   rmarkdown_1.9   
[29] reshape2_1.4.3   tidyr_0.8.0      purrr_0.2.4      magrittr_1.5    
[33] backports_1.1.2  scales_0.5.0     htmltools_0.3.6  MASS_7.3-47     
[37] splines_3.4.3    mnormt_1.5-5     assertthat_0.2.0 colorspace_1.3-2
[41] stringi_1.1.6    lazyeval_0.2.1   munsell_0.4.3    broom_0.4.3