W11

#載入資料
data("seeds")

#資料的前六筆、結構
seeds |> head() |> knitr::kable()

germ	moisture	covered
22	1	no
41	3	no
66	5	no
82	7	no
79	9	no
0	11	no

str(seeds)

## 'data.frame':    48 obs. of  3 variables:
##  $ germ    : num  22 41 66 82 79 0 25 46 72 73 ...
##  $ moisture: num  1 3 5 7 9 11 1 3 5 7 ...
##  $ covered : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

Multiple Regression Analysis

#畫圖
ggplot(aes(y = germ, x = moisture, color = covered), data = seeds) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

在no的那一條線中，moisture和germ屬於零相關。

在yes的那一條線中，moisture和germ屬於負相關，相關值會在後面的迴歸模型中驗證。

#multiple linear regression
mod <- lm(germ ~ moisture + covered, data = seeds)
summary(mod)

## 
## Call:
## lm(formula = germ ~ moisture + covered, data = seeds)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.526 -26.798   2.901  24.275  39.182 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  64.2390     8.8400   7.267 4.65e-09 ***
## moisture     -2.7134     1.1514  -2.357    0.023 *  
## coveredyes   -0.6601     7.8853  -0.084    0.934    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.02 on 44 degrees of freedom
##   (因為不存在，1 個觀察量被刪除了)
## Multiple R-squared:  0.1121, Adjusted R-squared:  0.07174 
## F-statistic: 2.777 on 2 and 44 DF,  p-value: 0.07312

關於迴歸模型的解釋

因為有一筆資料是N/A，lm無法進行分析，因此自動刪除。

germ = 64.24 - 2.71moisture - 0.66coveredyes + 27.02

因為covered是類別變項，所以當covered是yes時，germ = 64.24 - 2.71moisture - 0.66*1 + 27.02；而如果當covered是no時，64.24 - 2.71moisture + 27.02。

##虛無假設：yes和no的平均沒有差異

##對立假設：yes和no的平均有差異。

(#t-test是用來檢驗兩組平均數的差異)

mositure的t value是用來預測estimate的顯著，因為moisture的t value為-2.36超過兩個標準差，且p value<.05且不為0，說明有達到顯著水準。

所以，Moisture的β的負值代表的是負相關。

covered的t value為0.084沒有超過正負1.96，且p value未小於.05，所以無法推翻虛無假設。

#結論：moisture跟germ的相關值為-2.71，covered中的yes和no的平均值不等於0，且無法推翻虛無假設。

R-squared為0.1121，表示該迴歸模型可解釋依變項中11.21%的變異。

殘差圖

#normality of the residuals
hist( x = residuals(mod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

當迴歸模型對資料的解釋力越強，殘差會出現越多的0值，因此圖會顯示常態分佈，且平均數為0。

但是從此圖形看來，它並沒有呈現一個常態分佈的圖形，代表殘差越大，模型越無法解釋。

#根據模型畫圖(QQplot)
plot(mod, which = 2)

QQplot也是用來檢視殘差是否接近於常態分布。

從圖形來看，殘差值並沒有在線上，且都離得滿遠的，表示殘差並未服從常態分佈，可能是依變項中還有一些沒有被模型中的自變項所解釋。

W11

TPY

2022-04-26

Multiple Regression Analysis

關於迴歸模型的解釋

殘差圖