Chapter 4

數值敘述法

在課程“Chapter 1”已經介紹過中平均數、中位數、全距、變異數以及標準差其R語言所對應的指令，本章節接續介紹並以中央位置量數、變異性量數、相對位置量數與線性關係量數分類

中央位置量數

在此分類中以某班級15位同學的數學小考成績做為例子，算出其算數平均數、中位數、幾何平均數與眾數

score.student <- c(10,20,50,50,60,65,65,70,70,70,70,70,80,90,100)
#某班級15位同學的某次數學小考成績
mean(score.student)               #算數平均數

## [1] 62.66667

median(score.student)             #中位數

## [1] 70

exp(mean(log(score.student)))     #幾何平均數

## [1] 55.68241

which.max(table(score.student))   #眾數

## 70 
##  6

眾數的結果70為此出現最多次資料，那6這個數字不是代表出現6次，而是取table()觀測每個成績出現次數時，70這個組別位於資料的第6順位。

變異性量數

因中央位置量數仍無法敘述分配的全部狀況所以可利用全距，變異數，標準差，變異係數來探討其資料變異性

score <- sample(1:100,50)    #建立50筆隨機1到100的資料
max(score)-min(score)        #全距

## [1] 98

var(score)                   #變異數

## [1] 848.1065

sd(score)                    #標準差

## [1] 29.12227

sd(score)/mean(score)        #變異係數

## [1] 0.5902365

相對位置量數

在資料的分佈上，若預探討相對位置並使用盒形圖，方便資料其分布，在此利用R語言內件資料庫ToothGrowth，此筆數據紀錄10位受測者的牙齒實驗紀錄，實驗區別為同一種維生素使用三種不同劑量(0.5,1,2)毫克，補充方式有orange juice(OJ)與 ascorbic acid(VC)兩種，我們針對這個資料來畫盒形圖(box plot)

boxplot(len~supp,                            #y軸放len類別,x軸放supp類別
        data=ToothGrowth,                    #資料來源
        col=c("green","red"),                #賦予顏色
        main="box plot for len by OJ&VC",    #賦予標題
        ylab="tooth length"                 #賦予y軸標題
        )

在圖形中可以觀察箱形圖的第一分位數(Q1)、第二分位數(Q2)、第三分位數(Q3)等等資訊。

線性關係量數

在預觀測的兩區間資料，在假設兩者兼具有線性關係下，可利用線性關係量數來探討其交互關係，在Chapter 3介紹過的散布圖(scatter diagram)中，我們使用cars這個例子說明，可以發現速度與煞車距離有線性關係，分別使用共變異數(covariance)、相關係數(coefficient of correlation) ，以及判定係數(coefficient of determination)來觀察此筆資料。

cov(cars$speed,cars$dist)     #共變異數

## [1] 109.9469

cor(cars$speed,cars$dist)     #相關係數

## [1] 0.8068949

在使用判定係數(coefficient of determination)前，必須對cars資料使用最小平方法對兩變數建立一條線性方程式，我們利用

lm.cars<-lm(cars$dist~cars$speed)    #對cars的資料做簡迴歸模型並宣告在lm.cars
summary(lm.cars)                     #查看lm.cars內容

## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

在倒數第二行資料中可以看到Multiple R-squared: 0.651，這個數值就是我們所要的判定係數(coefficient of determination)，代表意思為lm.cars可以解釋cars原本資料的65.1%變異，剩餘34.9%的變異無法解釋