本週上課將介紹迴歸的假設檢定。因為資料分析的樣本來自於母體,所以樣本的迴歸係數不同於母體的迴歸係數。例如我們想知道母體的平均數\(\mu\),我們假設\(\hat{\mu}=\bar{y}=\frac{1}{n}\Sigma_{i=1}^{n}y_{i}\)。同理,我們想要系統化\(\hat{\beta}_{0}\)與\(\hat{\beta}_{1}\)對於\(\beta_{0}\)跟\(\beta_{1}\)的估計。
\[ \hat{u}_{i}=y_{i}-\hat{y}_{i}=y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i} \]
\[ MSD(\hat{u})\equiv\frac{1}{n}\sum\limits_{i=1}^{n}(\hat{u}_{i}-\bar{\hat{u}})^2=\frac{1}{n}\sum\limits_{i=1}^{n}\hat{u}_{i}^2 \]
\[ \begin{eqnarray} E[MSD(\hat{u})] & = & \frac{n-2}{n}\sigma^{2}_{u}\\ \sigma^{2}_{u} & = &\frac{n}{n-2}MSD(\hat{u}) \\ & =& \frac{n}{n-2}\cdot\frac{1}{n}\sum\limits_{i=1}^{n}\hat{u}_{i}^2 \\ & =& \frac{1}{n-2}\sum\limits_{i=1}^{n}\hat{u}_{i}^2 \end{eqnarray} \]
\(\frac{1}{n-2}\sum\limits_{i=1}^{n}\hat{u}_{i}^2\)代入: \[ \begin{eqnarray} $\hat{V}[\hat{\beta}_{1}|X] & = &\mathcal{\frac{\sigma_{u}^{2}}{\sum_{i=1}^{n}(x-\bar{x})^2}} \\ & = & \frac{\sigma_{u}^{2}}{SST_{x}} \end{eqnarray} \] \[ \begin{eqnarray} \hat{V}[\hat{\beta}_{0}|X] & = &\sigma_{u}^{2}\mathcal{\{\frac{1}{n}+\frac{\bar{x}^2}{\sum_{i=1}^{n}(x-\bar{x})^2}}\}\\ & = &\mathcal{\frac{\sigma_{u}^2\sum_{i=1}^{n}x^2}{n\sum_{i=1}^{n}(x-\bar{x})^2}} \end{eqnarray} \] where \[ \sigma_{u}^{2}=\frac{1}{n-2}\sum\limits_{i=1}^{n}\hat{u}_{i}^2 \]
\(\blacksquare\) 取開根號:\(\sqrt{\hat{V}[\hat{\beta}_{0}|X]}\)以及\(\sqrt{\hat{V}[\hat{\beta}_{1}|X]}\)\[ u\sim N(0,\sigma_{u}^{2}) \]
\[ \mathrm{Y}|\mathrm{X} \sim N(\beta_{0}+\beta_{1}X, \sigma_{u}^{2}) \]
也就是變異數同質性以及誤差項的平均值為0的假設。
\[ \sigma^2=\frac{\Sigma(y-\hat{y})^2}{n-2} \] \[ \mathrm{SE}(\hat{\beta_{1}})=\sqrt{\frac{\sigma^2}{\sum(x_{i}-\bar{x})^2}} \]
\(95\%\)的信賴區間相當於兩個標準誤,係數加減兩個標準誤構成一個\(95\%\)的信賴區間:
\[
\mathcal{[\hat{\beta_{1}}-2\cdot \mathrm{SE}(\hat{\beta_{1}}), \hspace{.3em}\hat{\beta_{1}}+2\cdot \mathrm{SE}(\hat{\beta_{1}})]}
\]
\(\it{H}_{0}\): X與Y沒有關係,也就是\(\it{H}_{0}\): \(\beta_{1}=0\)
\(\it{H}_{a}\): X與Y有某種關係,也就是\(\it{H}_{a}\): \(\beta_{1} \neq 0\)
\[ t=\frac{\hat{\beta}_{1}-0}{\mathrm{SE}(\hat{\beta}_{1})} \]
R
可以計算\(t\)分布的百分位的機率,也可以計算機率對應的百分位。首先計算特定機率的百分位:
qt(0.975, 1000)
## [1] 1.962339
qt(0.950, 1000)
## [1] 1.646379
qt(0.900, 1000)
## [1] 1.282399
qt(0.5, 1000)
## [1] 0
qt(0.1, 1000)
## [1] -1.282399
qt(0.05, 1000)
## [1] -1.646379
qt(0.025, 1000)
## [1] -1.962339
pt(1.68, 1000)
## [1] 0.9533652
pt(1.96, 1000)
## [1] 0.9748634
pt(2.00, 1000)
## [1] 0.9771148
請問大學教師的薪水與性別有關嗎?使用car
裡面的Salaries
資料:
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
fit1 <- with(Salaries, lm(salary ~ sex))
summary(fit1)
##
## Call:
## lm(formula = salary ~ sex)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57290 -23502 -6828 19710 116455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 101002 4809 21.001 < 2e-16 ***
## sexMale 14088 5065 2.782 0.00567 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30030 on 395 degrees of freedom
## Multiple R-squared: 0.01921, Adjusted R-squared: 0.01673
## F-statistic: 7.738 on 1 and 395 DF, p-value: 0.005667
h0=1-pt(1.96, 395)
h1=1-pt(2.78, 395)
h0; h1
## [1] 0.02534923
## [1] 0.002847915
請問氣溫與風力大小、月份有關嗎?我們分析airquality
資料:
fit2 <- with(airquality, lm(Temp ~ Wind+factor(Month)))
summary(fit2)
##
## Call:
## lm(formula = Temp ~ Wind + factor(Month))
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8929 -4.4212 0.0799 3.4794 17.8880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.1885 2.0544 36.112 < 2e-16 ***
## Wind -0.7434 0.1488 -4.996 1.64e-06 ***
## factor(Month)6 12.5436 1.5943 7.868 7.18e-13 ***
## factor(Month)7 16.3621 1.6183 10.110 < 2e-16 ***
## factor(Month)8 16.3163 1.6239 10.047 < 2e-16 ***
## factor(Month)9 10.2792 1.5959 6.441 1.59e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.175 on 147 degrees of freedom
## Multiple R-squared: 0.5884, Adjusted R-squared: 0.5744
## F-statistic: 42.03 on 5 and 147 DF, p-value: < 2.2e-16
\[
\hat{\beta_{1}}=\frac{\sum (x_{i}-\bar{x})(y_{i}-\bar{y})}{\sum(y_{i}-\bar{y})^2}
\] \[
\sigma^2=\frac{\Sigma(y-\hat{y})^2}{n-2}
\] \[
\mathrm{SE}(\hat{\beta_{1}})=\sqrt{\frac{\sigma^2}{\sum(x_{i}-\bar{x})^2}}
\]
library(kableExtra)
DT<-data.frame(X=c(4, 3, 5, 2, 4, 2, 2, 3, 2, 2, 2, 3, 5, 1, 1),
Y=c(5, 5, 5, 3, 4, 3, 3, 4, 4, 5, 4, 5, 3, 2, 1))
DT %>%
kable("html") %>%
kable_styling()
X | Y |
---|---|
4 | 5 |
3 | 5 |
5 | 5 |
2 | 3 |
4 | 4 |
2 | 3 |
2 | 3 |
3 | 4 |
2 | 4 |
2 | 5 |
2 | 4 |
3 | 5 |
5 | 3 |
1 | 2 |
1 | 1 |
DT2<-data.frame(X=c(2, 1, 3, 4, 5),
Y=c(3, 4, 5, 4, 4))
DT2 %>%
kable("html") %>%
kable_styling()
X | Y |
---|---|
2 | 3 |
1 | 4 |
3 | 5 |
4 | 4 |
5 | 4 |