\[E(Y|X)=\beta_{0}+\beta_{1}X\]
社會科學的研究對象經常是類別變數,例如有去投票跟沒去投票,民主與非民主國家,有接受外援與沒有接受等等。當依變數是類別變數時,如果使用最小平方法線性迴歸模型估計迴歸係數,有可能違反誤差項的變異數均一的假設,所以我們需要轉換依變數,成為新的線性模型。
我們將介紹二元變數迴歸的基本原理,描述自變數與二元依變數之間的關係。例如信用卡是否違約與信用卡未償金額的關係。並且介紹如何用R
運算最大概似法函數。
假設每一個隨機變數\(Y\)代表從參數為\(\pi_{i}\)抽樣得到的伯努利實驗變數,也就是在\(i\)的群體中,累積\(Y=1\)的次數\(N_{i}\)等於\(Y_{i}\)。
伯努利實驗是只有兩種結果的單次實驗。例如:擲出硬幣得到正面或者反面?參加考試是否及格?明天是否會下雨?從網路抽出100張照片,其中有寵物的比例?隨機變數的期望值\(E[Y]=\pi\),變異數為\(\pi(1-\pi)\)。
如果重複類似實驗\(n\)次,得到\(k\)成功事件的結果將形成二項分布。\(Y_{i} \sim B(n_{i},\pi_{i})\)。
從機器學習的觀點,如何根據訓練資料找到\(f(X)\),未來加入新的資料之後可以預測結果,例如病人有沒有罹患癌症、學生有沒有通過測驗等等。
累積\(Y=1\)的次數\(N_{j}\)等於\(Y_{j}\)的模型表示為方程式 (2.1):
\[\begin{align*} \frac{Y_{j}}{N_{j}} & = f_{j}\\ & = \sum \beta_{k}X_{ik}+u_{i} \tag{2.1} \end{align*}\]
\[\text{E}(Y \mid X)=[1\times \text{Pr}(Y=1\mid X)]+[0\times \text{Pr}(Y=0\mid X)]=\text{Pr}(Y=1\mid X)\]
\[\text{Pr}(Y \mid X)=\bf{X^{\prime}\beta}\]
\[\text{Var}(Y\mid X)=\bf{X^{\prime}\beta}(1-\bf{X^{\prime}\beta})\]
也就是說,誤差項的變異數取決於X,並不是固定的常數。這將會違反最小平方法迴歸模型的假設,所以我們不能用最小平方法估計模型。
如果想像\(Y_{i}\)分散在\(j\)個群體,每一個\(Y_{i}\)只有0與1,那麼\(Y_{i} = \sum Y_{ij}\)。
\(Y_{i}\)的pdf寫成:
\[\text{Pr}\{Y_{i}=y_{i}\}={n_{i}\choose y_{i}}\pi_{i}^{y_{i}}(1-\pi_{i})^{n_{u}-y_{i}}\]
假設\(j\)代表投給國民黨或是民進黨。如果有兩個以上類別,例如\(j=3\)。我們需要估計\(j-1\)個方程式。
以上的方程式\(\frac{Y_{j}}{N_{j}}\)等於機率\(\pi\),並且已知\(0\leq \pi\leq 1\)。根據最小平方法得到的\(\hat{Y_{i}}\)有可能大於1或是小於0,也就是預測值超出觀察值\(p\)的範圍。
此外,\(X\), \(Y\)之間可能不是線性關係;當\(X\)變動一單位,\(Y\)變動的單位可能因為\(X\)在不相同的值而不相等。因此,\(\pi\)必須經過轉換,才不會出現超出上下限的預測值,也才會有線性關係。
\(\blacksquare\) 什麼樣的人會無法償還信用卡的債務? 是財務不好的人嗎?表 2.1 呈現用最小平方法(OLS)估計的結果:
library(ISLR); data("Default")
Default$default <- as.numeric(ISLR::Default$default)-1
#Default$default <- Default$default #No=0, Yes=1
M1 <- lm(default ~ balance, data=Default)
stargazer::stargazer(M1, title='信用卡債務與財務',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
default | |
balance | 0.0001*** |
(0.00000) | |
Constant | -0.075*** |
(0.003) | |
Observations | 10,000 |
R2 | 0.123 |
Adjusted R2 | 0.122 |
Residual Std. Error | 0.168 (df = 9998) |
F Statistic | 1,397.000*** (df = 1; 9998) |
Note: | p<0.1; p<0.05; p<0.01 |
Default$predicted<-predict(M1)
plot(seq(1,10000), sort(Default$predicted), type='l',
xlab = '', ylab = "Predicted Values")
Figure 2.1: 排序後的預測值
r.b<-range(Default$balance)
r.b
## [1] 0 2654
xbalance <- seq(0, 2655, 0.1)
ybalance <- predict(M1, list(balance=xbalance), type='response')
plot(Default$default ~ Default$balance, pch=16, ylim=c(-0.2, 1),
col='#33aaeeee', xlab="Balance", ylab = "Predicted Values")
lines(xbalance, ybalance, col='#eebb00', lwd = 3)
Figure 2.2: 自變數與線性迴歸模型預測值
M2 <- glm(default ~ balance, data=Default, family =
binomial('logit'))
xbalance <- seq(0, 2655, 0.1)
ybalance <- predict(M2, list(balance=xbalance), type='response')
plot(Default$default ~ Default$balance, pch=16,
col='#33aaeeee', xlab="Balance", ylab = "Predicted Values")
lines(xbalance, ybalance, col='#eebb00', lwd = 3)
Figure 2.3: glm模型的自變數與預測值
\[0\leq\rm{Pr(Y=1|X)}=\beta_{0}+\beta_{1}X\leq 1\quad\forall\hspace{.2em}X\]
此外,用最小平方法(OLS)估計二元的依變數,其模型誤差\(\epsilon\)的變異數會隨著自變數的值而改變,也就不是固定不動,造成係數的標準誤不是最穩定。
最後,OLS模型如果估計二元的依變數,其誤差不是呈現常態分佈,違反迴歸假設,檢定的結果可能有誤。
假設\(Y=\sum X\beta+\epsilon\)。當我們知道\(Y\)只有0跟1,我們對\(Y\)取對數,寫成\(Y'=\rm{log}(Y)\),因此\(\rm{log}(Y)\equiv Y'\equiv \sum X\beta^{*}+\epsilon\),我們估計的是\(\beta^{*}\)而不是\(\beta\)。
此處\(\rm{log}(Y)\)稱為Link Function,寫成\(F(\cdot)=F(Y)=Y'=\rm{log}(Y) = X\beta+\epsilon\)
但是這種轉換並不如Probit或者Logit,因為\(-\infty \leq \rm{log}(Y)\leq 0\),我們希望的模型是:
因為\(y_{i}\sim \text{Bin}(n_{i},\pi_{i})\),平均數\(\pi_{i}\),\(\pi_{i}\)可以與一個線性模型連結:
\[\pi_{i}=\bf{X^{\prime}\bf{\beta}}\]
\[\beta_{0}+\beta_{1}x_{1}+\ldots+\beta_{k}x_{k}\]
之前的線性迴歸模型並不適用,因為\(\bf{X^{\prime}\bf{\beta}}\)可以是任何的值,但是\(\pi_{i}\)只有0與1,兩邊並不對等。
因此我們要考慮其他的連結,轉換整個模型。各種連結轉換函數在此。
把左邊改成\(\text{odds}=\frac{\pi_{i}}{1-\pi_{i}}\),得到:
\[\text{log(odds)}_{i}=\text{logit}(\pi_{i})=\bf{X^{\prime}\bf{\beta}}\]
\[\pi_{i}=\text{logit}^{-1}(\eta_{i})=\frac{e^{\eta_{i}}}{1+e^{\eta_{i}}}\]
\[\rm{log}(a)=\int_{1}^{a}\frac{1}{x}dx\]
\(\bullet\) 對數具有以下特性:
\[\mathrm{log}1=0\] \[\mathrm{log}_{a}a=1\] \[\mathrm{log}(a)+\mathrm{log}(b)=\mathrm{log}(a*b)\] \[\mathrm{log}(a)-\mathrm{log}(b)=\mathrm{log}(a/b)\] \[\mathrm{log}(a) =\infty\hspace{.1cm} \rm{if}\hspace{.1cm} a\approx \infty\] \[\mathrm{log}0=-\mathrm{Inf}\] \[\mathrm{log}a^k=k*\mathrm{log}a\] \[\rm{exp}^{log(a)} = a\] \[\mathrm{log}(\rm{exp}(a))=a\] \[\mathrm{log}(e)= 1\]
\[a^{\text{log}_{a}x}=x\]
當\(\text{log}_{a}x=y\),根據logarithm的特性,\(x=a^y\)。
所以\(a^{\text{log}_{a}x}\)如果改成\(a^y\),等於\(x\)。
所以當\(a^{\text{log}_{a}x}=a^y\)成立,\(a^{\text{log}_{a}x}=x\)。
我們實際計算看看:
a=10; b=5; k=3
log10(a)
## [1] 1
log(a)+log(b); log(a*b)
## [1] 3.912
## [1] 3.912
log(a)-log(b); log(a/b)
## [1] 0.6931
## [1] 0.6931
log(a^k); k*log(a)
## [1] 6.908
## [1] 6.908
exp(log(a))
## [1] 10
log(exp(b))
## [1] 5
\(\text{log}(x-1) + \text{log}(x+1) = \text{log}_{10}1\),請問x=?
\(2\text{log}\hspace{.2em}x+3\text{log}\hspace{.2em}y=\text{log}\hspace{.2em}z\),,如果\(z=x^{a}y^{b}\),請問a=?, b=?
\(\frac{\text{log}\hspace{.2em}225}{\text{log}\hspace{.2em}15}\)=?
\(\text{log}_{3}\hspace{.15em}x=5\),x=?
\(\bullet\)我們也會使用指數exponential來轉換對數的函數。指數有以下特性:
\[\rm{exp}^{0}=1\] \[\rm{exp}^{1}=2.718282\] \[\rm{exp}^{a-b}=\frac{\rm{exp}^{a}}{\rm{exp}^{b}} \] \[\rm{exp}^{a+b}=\rm{exp}^{a}\rm{exp}^{b} \]
\[\text{exp}^{\text{ln}\hspace{.1em}x}=\text{x}\]
\[\text{ln}({\text{exp}^x})= x\]
a=10; b=1
exp(0)
## [1] 1
exp(a-b); exp(a)/exp(b)
## [1] 8103
## [1] 8103
exp(a+b); exp(a)*exp(b)
## [1] 59874
## [1] 59874
log(exp(1))
## [1] 1
\(e^{2x}=9\),x=?
\(\text{ln}(4-y)=2\),y=?
假設咖啡的溫度與時間的函數如下:
\[T=20+50e^{-t/15}\]
請問當t=0時,\(T=?\)
請問當t=30時,\(T=?\)
請問當\(T=35^{\circ}\)C時,\(t=?\)
\(\bullet\) 在方程式 (3.1)中,\(\frac{\pi_{i}}{1-\pi_{i}}\)稱為勝算(odds),所謂勝算指的是發生某個事件對不發生某個事件的機率。
\(\bullet\) 何謂勝算?可表示成\(\frac{p(x)}{1-p(x)}\)。例如:
\(20\%\)的打擊率,表示打出安打跟無法上壘的勝算為1:4,因為\(\frac{0.2}{1-0.2}=\frac{1}{4}\),或者說無法上壘的機率是打安打的4倍。
\(90\%\)的合格率,表示合格是不合格機率的9倍,因為\(\frac{0.9}{1-0.9}=9\)。
所以我們稱logistic regreesion model為二元勝算對數模型。但是也有人稱為勝算比(odds ratio)(Gould and Hardin),這是因為勝算就是一種「比」(ratio)的概念。
\[\frac{\pi}{1-\pi}=\text{exp}^{X'\beta}\]
\[\frac{\pi_{1}}{(1-\pi_{1})}/\frac{\pi_{0}}{(1-\pi_{0})}\]
\[\begin{align*} \text{exp}^{\beta_{0}+\beta_{1}x_{1}+\beta_{1}}/\text{exp}^{\beta_{0}+\beta_{1}x_{1}} & = \text{exp}^{\beta_{0}+\beta_{1}x_{1}}\times \text{expt}^{\beta_{1}}/\text{exp}^{\beta_{0}+\beta_{1}x_{1}} \\ & = \text{exp}^{\beta_{1}} \end{align*}\]
這個結果表示勝算「比」會因為\(x_{1}\)增加一個單位而增加\(\text{exp}(\beta_{1})\).
勝算比是固定的,但是勝算隨著X而變化。
此外,因為\(\text{exp}(\text{log}(a))=a\),所以
\[ \eta_{i} \equiv \sum \beta_{k}X_{ik}\]
之前已經提到,\(\pi_{i}\)可以表示成: \[\begin{align*} \pi_{i} & =\rm{logit}^{-1}(\eta_{i}) \\ & =\frac{e^\eta_{i}}{1+e^\eta_{i}} \\ & =\frac{1}{1+e^{-\eta_{i}}} \end{align*}\]ˋ
要得到最後一個等式,先把\(1+e^{-\eta_{i}}\)轉換成:
\[ \frac{e^{\eta_{i}}}{e^{\eta_{i}}}+\frac{1}{e^{\eta_{i}}}=\frac{1+e^{\eta_{i}}}{e^{\eta_{i}}} \]
這是因為\(\text{exp}^0=1\),而且\(\text{exp}^{a-b}=\frac{\text{exp}^a}{\text{exp}^b}\)
代回原來的\(\frac{1}{\frac{1+e^{\eta_{i}}}{e^{\eta_{i}}}}\),就得出\(\frac{e^\eta_{i}}{1+e^\eta_{i}}\)。
\(Y\)的機率可以用logistic或者Probit的累積機率函數來表示,以\(F\)做為累積機率的符號如下:
Y<-seq(0, 1, by = 0.01)
Yprime<-car::logit(Y)
plot(Yprime ~ Y, xlab='Y', ylab=expression(paste("Y"^"'")),
type='l', col='#bb2211')
abline(v=0.5, lty=2); abline(h=0, lty=2)
Figure 3.1: Logit轉換後的值
plot(log(Y/(1-Y)) ~ Y, type='l', col='#013ae2')
Figure 3.2: 勝算值與Y的散佈圖
Y<-seq(0,1, by=0.01)
plot(qnorm(Y) ~ Y, type='l', col='#1111aa')
Figure 3.3: 勝算值與Y的散佈圖
plot(qnorm(Y) ~ Y, type='l', col='#1111aa')
lines(log(Y/(1-Y)) ~ Y, type='l', col='#a0013ae2')
leg.text=c('Probit','Logit')
legend(0, 2.3, leg.text, col=c('#1111aa','#a0013ae2'), lty=c(1,1), bty='n')
Figure 3.4: 勝算值與Y的散佈圖
隨機變數X(\(-\infty \le X\le \infty\))的累積機率密度(CDF)代表\(P(X\le x)\)。當X增加,CDF增加,範圍是0到1。
logistic 以及 probit迴歸模型的CDF都有S型的曲線,類似隨機變數的CDF,所以引用logistic 以及 probit的CDF來連結二元的依變數以及模型。
\[F(\bf{X^{\prime}}\beta)=\text{Pr}(Y=1\mid X)\]
\[F(\bf{X^{\prime}}\beta)=\frac{\text{exp}^{\bf{X^{\prime}}\beta}}{1+\text{exp}^{\bf{X^{\prime}}\beta}}\]
\[\Phi(\bf{X^{\prime}}\beta)=\text{Pr}(Y=1\mid X)\]
\[\Phi(z) = P(Z\le z)=P(Z\le \bf{X^{\prime}\beta}),\hspace{.2em}Z\sim N(0,1)\]
\[Y^{*}=\Phi^{-1}(\pi)=\bf{X^{\prime}\beta}\]
\(\Phi^{-1}\)也被稱為quantile 函數,範圍是0到1的機率,並且轉換機率為Z值,代表在標準化常態分配下與平均值0之間的標準差。在之前的假設檢定、信賴區間,我們都曾經給定p值,求出Z值。
而在R
,用\(\texttt{qnorm()}\)得到Z值,例如\(\texttt{qnorm(0.95)=1.645}\)。
所以我們估計出\(\bf{X^{\prime}\beta}\)之後,用\(\Phi^{-1}(\bf{X^{\prime}\beta})\)可以得到\(\text{Pr}(Y=1\mid X)\)。輸入不同的\(x\),可以計算\(\text{Pr}(Y=1\mid X)\)的變化(用\(\texttt{AER::predict()}\))。
用以下的資料說明Logit與Probit的模型,並且畫成圖3.5,比較Logit與Probit的曲線:
library(AER); data(HMDA)
#Models
denyprobit <- glm(deny ~ pirat,
family = binomial(link = "probit"),
data = HMDA)
denylogit <- glm(deny ~ pirat,
family = binomial(link = "logit"),
data = HMDA)
# plot data
plot(x = HMDA$pirat,
y = HMDA$deny,
main = "Probit and Logit Models of the Probability of Denial, Given P/I Ratio",
xlab = "P/I ratio",
ylab = "Deny",
pch = 20,
ylim = c(-0.4, 1.4),
cex.main = 0.9)
# add horizontal dashed lines and text
abline(h = 1, lty = 2, col = "#bb22bb")
abline(h = 0, lty = 2, col = "#bb22bb")
text(2.5, 0.9, cex = 0.8, "Mortgage denied")
text(2.5, -0.1, cex= 0.8, "Mortgage approved")
# add estimated regression line of Probit and Logit models
x <- seq(0, 3, 0.01)
y_probit <- predict(denyprobit, list(pirat = x), type = "response")
y_logit <- predict(denylogit, list(pirat = x), type = "response")
lines(x, y_probit, lwd = 1.5, col = "#CC2211")
lines(x, y_logit, lwd = 1.5, col = "#0000CC", lty = 2)
# add a legend
legend("topleft",
horiz = TRUE,
legend = c("Probit", "Logit"),
col = c("#CC2211", "#0000CC"),
lty = c(1, 2))
Figure 3.5: Logit與Probit模型預測值
資料來源:Hanck et al., 2024
Probit的分布是標準常態分佈,\(N\sim (0, 1)\)。Logistic的分布則是平均值0、標準差1.8,所以Probit係數乘以1.8大約等於Logistic迴歸係數。
最大概似法(Maximum Likelihood Estimation, MLE)的原理是對每一組\((x_{i}, y_{i})\),嘗試Pr(\(y_{i}=1\))=\(\Phi(\text{X}\beta)\)之中的\(\beta\)值,最大化出現Pr(\(y_{i}=1\))的機率。 參考StatQuest 的說明。
MLE也可以想成找到最佳的方法逼近資料的分佈,例如常態分佈、指數分佈、Gamma分布等等,讓同樣分佈的資料可以重複驗證。
例如我們嘗試\(\beta\)值得到Pr(\(y_{i}=1\))=0.8,而且實際上\(y_{i}=1\),我們可以說得到的概似機率為0.8。如果實際上\(y_{i}=0\),我們可以說得到的概似機率為0.2。
例如有一個隨機變數分佈如圖4.1。假設該變數成常態分佈。首先我們嘗試以下兩個常態分佈分佈:
curve(dnorm(x, 0.35, 0.1), from=-1.5, to=1, col='#3322EE', lwd=2)
curve(dnorm(x, -0.8, 0.2), from=-1.5, to=1, col='#FF22EE', lwd=2, add=T)
x<-c(-1, -0.85, -0.6,-0.50, -0.45, -0.30, -0.25, -0.20, -0.15,
-0.09, 0.00, 0.10, 0.16, 0.30, 0.50)
y<-rep(0, 15)
points(x,y, pch=16, cex=2, col='gray30')
Figure 4.1: 兩個常態分佈
顯然這兩個分佈的中心點與離散程度與實際資料有點差距。離資料的中心點以及離散程度越遠,表示概似程度(likelihood)越小。
所以當我們說資料的MLE估計時,表示我們極大化觀察到的資料接近特定分布的程度。我們改用不同的常態分佈來接近隨機變數,圖4.2 顯示點離散程度較小:
curve(dnorm(x, -0.22, 0.38), from=-1.5, to=1, col='#FF3333',
lwd=2, xlab='y')
y<-c(-1, -0.85, -0.6,-0.50, -0.45, -0.30, -0.25, -0.20, -0.15,
-0.09, 0.00, 0.10, 0.16, 0.30, 0.50)
y2<-rep(0, 15)
points(y, y2, pch=16, cex=2, col='gray30')
abline(v=-0.22, lty=2, lwd=1.5, add=T)
Figure 4.2: 第三個常態分佈
常態分佈的機率密度函數如下:
\[\rm{Pr}(y| \mu, \sigma)=\frac{1}{\sqrt{2 \pi\sigma^2}}e^{-(y-\mu)^2/2\sigma^2}\]
\[\begin{equation*} \mathcal{L}(\mu, \sigma|y_{i}) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(y-\mu)^2/2\sigma^2} \end{equation*}\]
在既定的\(\sigma\)以及\(y\),我們不斷地嘗試各種\(\mu\),以極大化概似機率。這些\(\mu\)對應的最大概似機率將成為一個常態分佈。當機率極大化時,常態分佈曲線也達到最高,斜率等於0,對應的\(\mu\),理論上等於資料的平均數。
假設只有一個觀察值20。假設\(\sigma=2\),代入300個\(\hat{\mu}\)求出最大化函數的值,並畫成圖4.3:
y<-20; sigma=2; mu<-seq(10, 30, length.out = 300)
L <- c()
n=length(mu)
for (i in 1: n){
L[i] <- 1/(sqrt(2*pi*sigma^2))*exp(-((y-mu[i])^2)/2*sigma^2)
}
plot(mu, L, type='l', lwd=2, col='#FF3322', xlab=expression(hat(mu)))
abline(h=max(L), lty=2, lwd=2, col='#998833')
Figure 4.3: 最大概似法求解圖
在求出\(\mu\)之後,再根據\(\mu\)以及\(y\),我們不斷地嘗試各種\(\hat{\sigma}\),以極大化概似機率。這些\(\hat{\sigma}\)對應的分佈當機率極大化時,曲線也達到最高,斜率等於0。
如果有\(n\)個\(y\),那麼概似機率需要連乘:
\[\mathcal{L}(\mu, \sigma|y_{1},\ldots,y_{n})=\mathcal{L}(\mu, \sigma|y_{1})\times\ldots,\times\mathcal{L}(\mu, \sigma|y_{n})\]
\(\bullet\)回想:\(\rm{log}(a^k)=k*log(a)\)以及\(\rm{log(e)=1}\)
方程式 (4.6)顯示,當\(\mu=\frac{\sum_{i}^{n}y_{i}}{n}\)時,也就是樣本平均值時,極大化最大概似函數。
接下來我們計算當方程式 (4.5) 等於0時的\(\sigma\):
obs<-c(6, 5, 10, 3, 1, 8, 2, 9, 0, 2.5, 3.5, 4.2, 4.5, 4.9)
dat <- tibble(x=obs, y=0)
p <- ggplot(data = dat, aes(x = x, y = y)) + geom_point(color='#CB1D00')
p
Figure 4.4: 14個觀察值
x<-c(6, 5, 10, 3, 1, 8, 2, 9, 0, 2.5, 3.5, 4.2, 4.5, 4.9)
y<-rep(0, 14)
plot(x, y, ylim=c(0, 0.3), xlim=c(-1,13))
curve(dnorm(x, 4, 2.8), from=-1, to=11, col='#3322EE', lwd=2, add=T)
curve(dnorm(x, 2, 1.5), from=-1, to=11, col='#FF22EE', lwd=2, add=T)
points(x,y, pch=16, cex=2, col='gray30')
leg.text=c(expression(paste(mu,'=4,', ' ',sigma,'=2.8')),
expression(paste(mu,'=2,', ' ',sigma,'=1.5')))
legend(8, 0.28, leg.text, col=c('#3322EE','#FF22EE'), lty=c(1,1), bty='n')
Figure 4.5: 14個觀察值常態分佈
library(data.table); library(ggplot2)
set.seed(5000)
obs<-c(6, 5, 10, 3, 1, 8, 2, 9, 0, 2.5, 3.5, 4.2, 4.5, 4.9)
logL<-function(p) sum(log(dnorm(obs,p[1],p[2]))) # p=c(p[1],p[2])=c(mu,sigma)
library('plyr')
dLogL<-expand.grid(museq=seq(2, 7, 0.1),sigmaseq=seq(2,4,0.1))
dLogL<-ddply(dLogL,.(museq, sigmaseq),
transform,logLike=logL(c(museq,sigmaseq)))
#Graph
ggplot(data=dLogL, aes(x=museq,y=logLike,
color=factor(sigmaseq)))+ geom_line()
Figure 4.6: 極大化函數一
DT<-data.frame(dLogL)
is.na(DT) <- sapply(DT, is.infinite)
for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)
min_value<-min(DT$logLike)
max_value<-max(DT$logLike)
DT[which(DT$logLike==max_value),]
## museq sigmaseq logLike
## 534 4.5 2.8 -34.39
dt <- DT %>% group_by(sigmaseq)
#mean(obs) =4.5
#fix mu=4.5
dt <- dt %>% filter(museq==4.5)
ggplot(data=dt, aes(x=sigmaseq, y=logLike))+
geom_line(col='#339999', size=1) +
geom_hline(aes(yintercept=max(logLike, na.rm=T)),
color="red", linetype="dashed", size=1) +
labs(x=expression(hat(sigma)),y='Log Likelihood') +
scale_x_continuous(labels=seq(2,4, by=0.1),
breaks=seq(2,4, by=0.1))
Figure 4.7: 極大化函數二
#Negative log-likelihood
negLogL<-function(p) -logL(p)
estPar<-optim(c(20,10),negLogL)
MLEparameters<-estPar$par
MLEparameters
## [1] 4.543 2.821
\[\hat{\pi}=\frac{\sum_{i=1}^n y_{i}}{n} \]
概似機率的估計式的抽樣分佈會接近常態分佈,N(0, H),其中H代表Hessian matrix (海森矩陣)。
\(\hat{\pi}=\frac{\sum_{i=1}^n y_{i}}{n}\)表示樣本平均數可以最大化概似函數。
以10個伯努利變數值為例,最大化概似函數之後畫圖4.8,約在\(p=0.696\)時達到最高點,而這10個變數值的平均值等於0.7(參考White)
set.seed(1000)
p.parameter <- 0.6
sequence <- rbinom(10, 1, p.parameter)
log.likelihood.sum <- function(sequence, p) {
log.likelihood <- sum(log(p)*(sequence==1)) + sum(log(1-p)*(sequence==0))
}
possible.p <-seq(0,1, length.out=100)
#jpeg('Likelihood_Concavity.jpg')
library('ggplot2')
g1<-qplot(possible.p,
sapply(possible.p, function (p) {log.likelihood.sum(sequence, p)}),
geom = 'line',
main = 'Likelihood as a Function of p',
xlab = 'p',
ylab = 'Likelihood')
g1 + scale_x_continuous(labels=seq(0,1, by=0.1),
breaks=seq(0,1, by=0.1))
Figure 4.8: 10個伯努利變數值最大化概似函數
#dev.off()
dt<-data.frame(p=possible.p,
value=sapply(possible.p, function (p) {log.likelihood.sum(sequence, p)}))
dt<-dt[-c(1,100),]
m.value=max(dt$value)
dt[which(dt$value==m.value),]
## p value
## 70 0.697 -6.109
R
的optimize函數應用在二元變數的方式。首先寫下log函數。
def <- as.numeric(ISLR::Default$default)
y<-c()
y[def==2]<-1
y[def==1]<-0
N<-1
logL <- function(p) sum(log(dbinom(y, N, p)))
logL(0.1)
## [1] -1785
optimize(logL, lower=0, upper=1, maximum=TRUE)
## $maximum
## [1] 0.0333
##
## $objective
## [1] -1460
mean(y)
## [1] 0.0333
\[-\frac{n}{2}\rm{ln}(2\pi)-\it{n}\rm{ln}(\sigma)-\frac{(y_{1}-\mu)^2}{2\sigma^2}-\ldots-\frac{(y_{n}-\mu)^2}{2\sigma^2}\]
\[\mathcal{L}(X|\mu,\sigma^2)=\frac{-n}{2}\rm{ln}(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum (x_{i}-\mu)^2\]
# specify the single value normal probability function
norm_lik = function(x, mu, s){
y = 1/sqrt(2*pi*s^2)*exp((-1/(2*s^2))*(x-mu)^2)
}
# and plot it just to make sure
xval=seq(-3,3, length.out = 600)
yval=sapply(seq(-3,3, length.out=600),FUN=norm_lik,mu=0,s=1)
plot(xval, yval, type='l', ylab='',xlab='', main='Normal Distribution',
col='#FF22EE', lwd=1.5)
Figure 4.9: 常態分佈機率密度函數圖
# Likelihood function
llik = function(x,par){
m=par[1]
s=par[2]
n=length(x)
# log of the normal likelihood
# -n/2 * log(2*pi*s^2) + (-1/(2*s^2)) * sum((x-m)^2)
ll = -(n/2)*(log(2*pi*s^2)) + (-1/(2*s^2)) * sum((x-m)^2)
# return the negative to maximize rather than minimize
return(-ll) }
set.seed(270511)
xvec<-rnorm(600, 1.5, sqrt(pi))
# call optim with the starting values 'par', # the function (here 'llik'),
# and the observations 'x'
res0 = optim(par=c(.5,.5), llik, x=xvec)
#Result
res=rbind('observation'=c('mean'=mean(xvec),
'sd'=sd(xvec)),
'optim'=res0$par)
knitr::kable(res)%>%
kable_styling(bootstrap_options = "striped", full_width = F)
mean | sd | |
---|---|---|
observation | 1.54 | 1.768 |
optim | 1.54 | 1.767 |
結果估計是平均值為1.53時、標準差=1.77時最大化,剛好就是我們的觀察值資料。
極大化的過程中,最大概似值越大表示資料越符合特定的分佈,或者是- log likelihood越小越好。
R
進行最大概似法求出二元變數迴歸模型的係數,但是需要先知道要求的最大概似函數是什麼?
首先複習\(a\cdot\rm{ln}(x)=\rm{ln}(x^a)\),如果\(a=-1\),那麼\(\rm{ln}(\frac{1}{x})=\rm{ln}(x^{-1})=-\rm{ln}(x)\)。
其次,根據(3.3),以下的關係成立:
\[p(y;X,\beta)=(1+\rm{exp}(-X\beta))^{-1}\]
\[\begin{align*} L(\beta)&=\rm{ln}\prod\limits_{i=1}^{n}p^{y_{i}}(1-p)^{1-y_{i}}\\ &=\sum\limits_{i=1}^{n}(y_{i}\rm{ln}(p)+(1-y_{i})\rm{ln}(1-p))\\ &=\sum\limits_{i=1}^{n}(y_{i}\rm{ln}(\frac{1}{1+\rm{exp}(-X\beta)})+(1-y_{i})\rm{ln}(1-(\frac{1}{1+\rm{exp}(-X\beta)})))\\ &=-\sum\limits_{i=1}^{n}(y_{i}\rm{ln}(1+\rm{exp}(-X\beta))+(1-y_{i})\rm{ln}(1+\rm{exp}(X\beta)) \tag{5.1} \end{align*}\]
set.seed(02139)
# Dep. V.
y<- rbinom(1000, 1, prob = 0.67)
# Ind. V.
x <- cbind(X1=rnorm(1000, 1, 0.2))
# Likelihood function for logit
llk.logit <- function(param,y,x) {
os <- rep(1,length(x[,1]))
x <- cbind(os,x)
b <- param[ 1 : ncol(x) ]
xb <- x%*%b
sum( y*log(1+exp(-xb)) + (1-y)*log(1+exp(xb)));
# optim is a minimizer, so min -ln L(param|y)
}
# Fit logit model using optim
ls.result <- lm(y~x) # use ls estimates as starting values
stval <- ls.result$coefficients # initial guesses
logit.result <- optim(stval,llk.logit,method="BFGS",hessian=T,y=y,x=x)
# call minimizer procedure
pe.1 <- logit.result$par # point estimates
vc.1 <- solve(logit.result$hessian) # var-cov matrix
se.1 <- sqrt(diag(vc.1)) # standard errors
ll.1 <- -logit.result$value # likelihood at maximum
#Result
res=rbind('Estimates'=rev(pe.1),
'Std. Error'=rev(se.1),
'Log likelihood'=ll.1)
knitr::kable(res)%>%
kable_styling(bootstrap_options = "striped", full_width = F)
x | (Intercept) | |
---|---|---|
Estimates | -0.1773 | 0.9556 |
Std. Error | 0.3310 | 0.3410 |
Log likelihood | -622.8996 | -622.8996 |
m1<-glm(y~x, family = binomial('logit'))
stargazer::stargazer(m1, title='二元勝算對數迴歸模型',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
y | |
x | -0.177 |
(0.331) | |
Constant | 0.956*** |
(0.341) | |
Observations | 1,000 |
Log Likelihood | -622.900 |
Akaike Inf. Crit. | 1,250.000 |
Note: | p<0.1; p<0.05; p<0.01 |
#logLik(m1)
\[\hat{\bf{\beta}} = (\bf{X^{\prime}WX})^{-1}\bf{X^{\prime}Wz}\]
\[w_{i}=\hat{u_{i}}(n_{i}-\hat{u_{i}})/n_{i}\]
\[\text{var}(\hat{\beta})=(\bf{X^{\prime}WX})^{-1}\]
通常我們對二元勝算對數迴歸模型的係數取指數,詮釋得到的結果為當該變數變動一個單位,依變數增加\(\text{exp}(\beta)\)倍,或者是增加\(100\times(\text{exp}(\beta)-1)\%\)。這是怎麼來的?
如果我們只有一個自變數X,一個依變數Y,這兩個變數都只有0和1兩個值。
當X=0, \(\text{log}\Big(\frac{\pi_{i}}{1-\pi_{i}}\Big)_{0}=\alpha+\beta X=\alpha\)
當X=1, \(\text{log}\Big(\frac{\pi_{i}}{1-\pi_{i}}\Big)_{1}=\alpha+\beta X=\alpha+\beta\)
\[\beta = \text{log}\Big(\frac{\pi_{i}}{1-\pi_{i}}\Big)_{1}-\text{log}\Big(\frac{\pi_{i}}{1-\pi_{i}}\Big)_{0}=\text{log}(\text{odds}\hspace{.2em}\text{ratio}) \]
\[\text{exp}(\beta)=\text{exp}(\text{log}\hspace{.15em}\text{OR})=\text{OR}\]
\(\blacksquare\) 個人財務會不會跟無法償還信用卡的債務有關?我們用\(\tt{glm()}\)函數估計二元勝算迴歸模型,注意要設定\(\tt{family=binomial(logit)}\)。
library(janitor); library(ISLR)
tab <- Default %>% tabyl(student, default) %>%
adorn_totals(where = c('row', 'col'))
tab
## student 0 1 Total
## No 6850 206 7056
## Yes 2817 127 2944
## Total 9667 333 10000
因為\(\eta_{i}=\text{log}\frac{\pi_{i}}{1-\pi_i{}}\),所以兩個群體的比較可以用\(\eta_{i,1}-\eta_{i,2}\)。
所以學生身份有違約的勝算是\(\text{log}(127/2817)\),沒有學生身份的違約的勝算是\(\text{log}(206/6850)\),兩個相減,也就是
\[\beta = \text{log}\Big(\frac{\pi_{i}}{1-\pi_{i}}\Big)_{1}-\text{log}\Big(\frac{\pi_{i}}{1-\pi_{i}}\Big)_{0}=\text{log}(\text{odds}\hspace{.2em}\text{ratio}) \]
結果會剛好等於\(\hat{\beta}\)。
計算勝算的差異等於勝算比取對數:
log_oddsratio <- log(127/2817) - log(206/6850)
cat('log of odds ratio:', log_oddsratio)
## log of odds ratio: 0.4049
library(ISLR)
M1 <- glm(default ~ student, data=Default, family=binomial(logit))
stargazer::stargazer(M1, title='信用卡債務與學生身份',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
default | |
studentYes | 0.405*** |
(0.115) | |
Constant | -3.504*** |
(0.071) | |
Observations | 10,000 |
Log Likelihood | -1,454.000 |
Akaike Inf. Crit. | 2,913.000 |
Note: | p<0.1; p<0.05; p<0.01 |
library(ISLR)
M2 <- glm(default ~ balance, data=Default, family=binomial(logit))
stargazer::stargazer(M2, title='信用卡債務與財務',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
default | |
balance | 0.005*** |
(0.0002) | |
Constant | -10.650*** |
(0.361) | |
Observations | 10,000 |
Log Likelihood | -798.200 |
Akaike Inf. Crit. | 1,600.000 |
Note: | p<0.1; p<0.05; p<0.01 |
由5.2可知,\(\hat{\beta}_{1}=0.0055\),代表增加一個單位的信用卡扣掉貸款的消費金額(balance),會增加違約(default)的指數勝算(log odds)0.0055,或者是違約對不違約的勝算增加\(\text{exp}^{0.0055}=1.005515\)。
我們畫圖 5.1表示\(\frac{p(X)}{1-p(X)}\)與自變數的關係:
a=-10.651; b=0.0055
y=exp(a)*exp(b*ISLR::Default$balance)
yi=sort(y)
xi=sort(ISLR::Default$balance)
dt <-data.frame(yi, xi)
ggplot(dt, aes(x=xi, y=yi)) +
geom_point(col='#CC9933')
Figure 5.1: 自變數與勝算的關係
a=-10.651; b=0.0055
xi=sort(ISLR::Default$balance)
ETA=a + b*xi
pi=exp(ETA)/(1+exp(ETA))
dt <-data.frame(x=xi, y=pi)
ggplot(dt, aes(x=xi, y=pi)) +
geom_point(col='#FF9933')
Figure 5.2: 機率與指數的關係
\[\text{odds(male)}=\frac{0.8}{0.2}=4\]
\[\text{odds(female)}=\frac{0.2}{0.8}=0.25\]
所以\(\text{exp}^{\beta_{1}}\)等於當=1是x=0的y=1的倍數。這個解釋特別適用於自變數是二元類別的模型。
例如表 5.4納入自變數是有無學生身份,預測是否發生信用卡違約的模型為:
library(ISLR)
#Default$default <- as.numeric(ISLR::Default$default)-1
M3 <- glm(default ~ student, data=Default, family=binomial(logit))
stargazer::stargazer(M3, title='學生身份與信用卡違約',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
default | |
studentYes | 0.405*** |
(0.115) | |
Constant | -3.504*** |
(0.071) | |
Observations | 10,000 |
Log Likelihood | -1,454.000 |
Akaike Inf. Crit. | 2,913.000 |
Note: | p<0.1; p<0.05; p<0.01 |
M2 <- glm(default ~ balance, data=Default, family=binomial(logit))
M4 <- glm(default ~ balance + student, data=Default, family=binomial(logit))
stargazer::stargazer(M1, M2, M4, title='雙自變數的Logit模型',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |||
default | |||
(1) | (2) | (3) | |
studentYes | 0.405*** | -0.715*** | |
(0.115) | (0.148) | ||
balance | 0.005*** | 0.006*** | |
(0.0002) | (0.0002) | ||
Constant | -3.504*** | -10.650*** | -10.750*** |
(0.071) | (0.361) | (0.369) | |
Observations | 10,000 | 10,000 | 10,000 |
Log Likelihood | -1,454.000 | -798.200 | -785.800 |
Akaike Inf. Crit. | 2,913.000 | 1,600.000 | 1,578.000 |
Note: | p<0.1; p<0.05; p<0.01 |
x=0.25; mu=1; sd=2
pnorm(x, mu, sd)
## [1] 0.3538
x<-seq(-5, 5, length.out=1000)
CDF <- c()
for (i in 1: 1000){
CDF[i] = pnorm(x[i], 1, 2)
}
dt <- data.frame(x, CDF)
library(ggplot2)
ggplot(data=dt, aes(x=x, y=CDF)) +
geom_line(col='#CC33FF', size=1.5)
Figure 6.1: 累積機率密度圖
圖 6.1顯示,標準化常態分佈的累積機率密度介於0與1之間。可以寫成\(\Phi(Z)\in [0,1]\)。當\(x=0\),剛好累積\(50\%\)。
由於依變數為二元變數\(Y_{i}=0\)或1,服從伯努利分佈,參數為次數\(N\)與機率\(P\),表示成\(P_{i}(Y_{i}=1)\)
\[ \text{Pr(y=1|x)}=F(Z)=\int_{-\infty}^{\beta_{0}+X\beta_{1}}\frac{1}{2\pi}exp(-u^2/2)du\equiv \Phi (Z) \]
probit function轉換\(X\beta\)成為常態分佈的\(Z\)分數,所以\(X\beta\)越大,Pr(Y=1)越大。
Probit模型可寫成:
\[\rm{Pr}(y=1|X)=\Phi(X\beta)\]
set.seed(100)
n=3000; k=1000; p=runif(1,0.1,0.99); d=rnorm(1, 1, 0.01)
V=rbinom(n, k, p)
Vy<-c()
for (i in 1: n){
if(V[i]>=k*p*d)
Vy[i]=1
else(Vy[i]=0)
}
table(Vy)
## Vy
## 0 1
## 1289 1711
X=rnorm(n, p*d, sqrt(p*(1-p)))
fit1 <- glm(Vy ~ X, family=binomial(probit) )
stargazer::stargazer(fit1, title='模擬資料中的自變數作用',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
Vy | |
X | -0.094** |
(0.048) | |
Constant | 0.211*** |
(0.029) | |
Observations | 3,000 |
Log Likelihood | -2,048.000 |
Akaike Inf. Crit. | 4,099.000 |
Note: | p<0.1; p<0.05; p<0.01 |
set.seed(100)
n=3000; k=1000; p=runif(1,0.1,0.99); d=rnorm(1, 1, 0.01)
newCDF <- c()
xval=sort(X)
for (i in 1: n){
newCDF[i] = pnorm(0.210-0.094*X[i], p*d, sqrt(p*(1-p)))
}
Z <-sort(newCDF)
dt<-data.frame(Vy, X, xval, Z)
library(ggplot2)
G=ggplot(data=dt, aes(x=xval, y=Vy)) +
geom_point(size=1.5)
G+geom_line(data=dt, aes(x=xval, y=Z),
col='#33EEFF', size=1.2)
Figure 6.2: 模擬資料之Z值圖
\[\frac{1}{\sqrt{2\pi}}\int \text{exp}\Big(\frac{-1}{2}Z^2\Big)dZ\]
# Define function for the standard normal PDF
pdf_func <- function(x) {
dnorm(x, mean = 0, sd = 1)
}
# Generate x-axis values
x <- seq(from = -3, to = 3, by = 0.1)
# Calculate PDF values
y <- pdf_func(x)
# Create the plot
plot(x, y, type = "l", xlab = "Value (Z)", ylab = "Density", main = "PDF of Standard Normal Distribution", col = '#a11cbb')
Figure 6.3: 機率密度函數
# Define function for the standard normal CDF (pnorm in R)
pnorm_func <- function(x) {
pnorm(x, mean = 0, sd = 1)
}
# Generate x-axis values
x <- seq(from = -5, to = 5, by = 0.1)
# Calculate CDF values
y <- pnorm_func(x)
# Create the plot
plot(x, y, type = "l", xlab = "Value (Z)", ylab = "P(Z <= x)", main = "CDF of Standard Normal Distribution", col = '#a11cbb')
Figure 6.4: 累積機率密度函數
\(\bullet\) 當X\(\beta\)=-2,Pr(y=1)=\(\Phi(-2)=0.022\)
\(\bullet\) 當X\(\beta\)=-1,Pr(y=1)=\(\Phi(-1)=0.158\)
\(\bullet\) 當X\(\beta\)=2,Pr(y=1)=\(\Phi(2)=0.977\)
R
計算以上結果:pnorm(-2, 0, 1)
## [1] 0.02275
pnorm(-1, 0, 1)
## [1] 0.1587
pnorm(2, 0, 1)
## [1] 0.9772
j<-c()
k<-seq(-3, 3, 0.1)
n <- length(k)
for (i in 1:n)
{
j[i] <- pnorm (k[i], 0, 1)
}
plot(k, j, main="CDF", xlab=expression(paste('X', beta)),
ylab="Probability", col = '#22aa00',
type="l", lwd=1.5)
Figure 6.5: 標準化常態分佈累積機率密度
由以上可知,\(0<\Phi(Z)<1\),而且累積的機率形成S形的曲線。
當x增加一個單位,Pr(y=1) 增加\(\beta\)的Z值。
# Declare the number of observations to create.
nobs=1000
x1 = rnorm(nobs)*.15^.5
x2 = rnorm(nobs)*.35^.5
u = rnorm(nobs)*(.5)^.5
inside = 2*x1+2*x2+u
p = pnorm(inside)
y = rbinom(nobs,1,p)
mydata = data.frame(y=y,x1=x1,x2=x2)
X <- as.matrix(mydata[,2:3])
ll.probit <- function(beta, y=y, X=X){
if(sum(X[,1]) != nrow(X)) X <- cbind(1,X)
phi <- pnorm(X%*%beta, log = TRUE)
opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)
logl <- sum(y*phi + (1-y)*opp.phi)
return(logl)
}
## Estimate the MLE:
p.res <- optim(par = rep(0, ncol(X) + 1),
fn = ll.probit,
y = y,
X = X,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
names(p.res$par) <- c('Intercept', colnames(X))
# Extract the coefficients from optim
coefs <- p.res$par
# Calculate the variance
varcov <- -solve(p.res$hessian)
# Calculate the standard errors
ses <- sqrt(diag(varcov))
# Log lik
ll.1 <- p.res$value
#Result
res=rbind('Estimates' = coefs,
'Std. Error'= ses,
'Log likelihood'=ll.1)
knitr::kable(res)%>%
kable_styling(bootstrap_options = "striped", full_width = F)
Intercept | x1 | x2 | |
---|---|---|---|
Estimates | 0.0001 | 1.4449 | 1.7239 |
Std. Error | 0.0475 | 0.1319 | 0.1116 |
Log likelihood | -462.3379 | -462.3379 | -462.3379 |
pm1<-glm(y~x1+x2, family = binomial('probit'))
stargazer::stargazer(pm1, title='二元單元機率迴歸模型',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
y | |
x1 | 1.445*** |
(0.136) | |
x2 | 1.724*** |
(0.111) | |
Constant | 0.0002 |
(0.047) | |
Observations | 1,000 |
Log Likelihood | -462.300 |
Akaike Inf. Crit. | 930.700 |
Note: | p<0.1; p<0.05; p<0.01 |
#logLik(m1)
#ucla.url <- "https://stats.idre.ucla.edu/stat/data"
#admitlink<- paste(ucla.url, "binary.csv", sep = "/")
#download.file(url=admitlink, destfile = "./data/uclaadmit.csv" )
addt <- here::here('data', 'binary.txt')
Admit <-read.table(addt, header = T, sep = ',')
Admit$rank.f<- factor(Admit$rank)
knitr::kable(head(Admit),format = 'simple',
booktabs=T,caption='UCLA申請入學')
admit | gre | gpa | rank | rank.f |
---|---|---|---|---|
0 | 380 | 3.61 | 3 | 3 |
1 | 660 | 3.67 | 3 | 3 |
1 | 800 | 4.00 | 1 | 1 |
1 | 640 | 3.19 | 4 | 4 |
0 | 520 | 2.93 | 4 | 4 |
1 | 760 | 3.00 | 2 | 2 |
fitadmit <- glm(admit ~ gre + gpa + rank.f, data = Admit, family = binomial("probit"))
stargazer::stargazer(fitadmit, title='被錄取與否的Probit模型',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
admit | |
gre | 0.001** |
(0.001) | |
gpa | 0.478** |
(0.197) | |
rank.f2 | -0.415** |
(0.195) | |
rank.f3 | -0.812*** |
(0.208) | |
rank.f4 | -0.936*** |
(0.245) | |
Constant | -2.387*** |
(0.674) | |
Observations | 400 |
Log Likelihood | -229.200 |
Akaike Inf. Crit. | 470.400 |
Note: | p<0.1; p<0.05; p<0.01 |
\[F(-2.387+0.001*\rm{GRE}+0.478*\rm{GPA}-0.415*\rm{rank2}-0.812*\rm{rank3}-0.936*\rm{rank4})\]
\[F(-2.387+0.001*500)=0.0295\]
R
計算:pnorm(-2.387+0.001*500)
## [1] 0.02958
\[F(-2.387+0.001*600)=0.0369\]
pnorm(-2.387+0.001*600)
## [1] 0.03697
\[F(-2.387+0.001*600)=0.0458\]
pnorm(-2.387+0.001*700)
## [1] 0.0458
以上計算方式可以代入GPA的平均值,或者其他的GRE分數,得到不同的累積機率密度。
畫圖 6.6顯示不同的GPA情況下,當GRE分數增加時,不同等級的學生的錄取機率增加的趨勢:
newdata <- data.frame(gre = rep(seq(from = 200, to = 800, length.out = 100),
4 * 4), gpa = rep(c(2.5, 3, 3.5, 4), each = 100 * 4),
rank.f= factor(rep(rep(1:4,
each = 100), 4)))
newdata[, c("p", "se")] <- predict(fitadmit, newdata, type = "response", se.fit = TRUE)[-3]
ggplot(newdata, aes(x = gre, y = p, colour = rank.f)) + geom_line() +
facet_wrap(~gpa)
Figure 6.6: 綠取機率與分數的關係
\[\text{ln}(\frac{\pi_{i}}{1-\pi_{i}})=\bf{X^{\prime}\beta} \]
\[\pi_{i}=\frac{\text{exp}(X^{\prime}\beta)}{1+\text{exp}(\bf{X^{\prime}\beta)}}\]
\[\hat{\beta}=\bf{(X^{T}WX)^{-1}X^{T}Wz} \] - \(\bf{W}\)是以下的殘差值\(\hat{u}_{i}\)所構成的矩陣:
\[w_{ii}=\hat{u}_{i}(n_{i}-\hat{u}_{i})/n_{i}\]
\(n_{i}\)是常態化(normalization)的因素。
\(\bf{z}\)矩陣代表:
\[z_{i}=\hat{\eta}_{i}+\frac{y_{i}-\hat{u}_{i}}{\hat{u}_{i}(n_{i}-\hat{u}_{i})}n_{i}\]
\(z\)代表殘差除以估計的標準差後的標準化殘差。
IRLS是一個加權最小平方迴歸(weighted least squares)的重複疊合的演算式,IRLS可以解決同時並存的線性迴歸模型求解的問題。有關IRLS的說明在此。
\[\hat{\beta}\sim N\bf{(\beta, \phi(X^{T}WX)^{-1})} \]
\[ (\hat{\pi}-z_{\alpha/2}\widehat{SE},\hspace{1em}\hat{\pi}+z_{\alpha/2}\widehat{SE}) \] 而且 \[\frac{\hat{\beta_{j}} - \beta_{j}}{\widehat{SE}}\sim z \]
R
一樣的估計。特別注意要去掉遺漏值。
\[\lambda=\frac{\mathcal{L}(\theta)}{\mathcal{L}(\hat{\theta})} \]
\[-2\hspace{.1em}\text{log}\hspace{.1em}\lambda\hspace{.5em}\xrightarrow{d}\hspace{.5em}\chi^2\]
\[-2\text{log}\frac{L(\hat{\beta}|\beta^{1}=\beta^{1}_{0})}{L(\hat{\beta})}\leq \chi^2_{1-\alpha, q} \]
我們可以延伸上述的logit迴歸模型到多變數: \[ \text{log}(\frac{p(X)}{1-p(X)})=\beta_{0}+\beta_{1}X_{1}+\ldots+\beta_{p}X_{p} \]
因為多於一個變數,所以在詮釋其中一個變數的影響時,需要強調其他的變數在同樣水準。
在計算預測值時,必須設定變數特定的值,除了我們所關心的自變數。例如平均值、眾數等等。
\(\blacksquare\)哪些因素影響信用卡違約?用二元勝算對數迴歸模型估計如表 8.1:
M5 <-glm(default ~ balance+income+student, data=Default, family=binomial(logit))
stargazer::stargazer(M5, title='影響信用卡違約的模型',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
Dependent variable: | |
default | |
balance | 0.006*** |
(0.0002) | |
income | 0.00000 |
(0.00001) | |
studentYes | -0.647*** |
(0.236) | |
Constant | -10.870*** |
(0.492) | |
Observations | 10,000 |
Log Likelihood | -785.800 |
Akaike Inf. Crit. | 1,580.000 |
Note: | p<0.1; p<0.05; p<0.01 |
M5.or <- exp(coef(M5))
knitr::kable(M5.or, format = 'simple',
caption='被錄取與否的模型')
x | |
---|---|
(Intercept) | 0.0000 |
balance | 1.0058 |
income | 1.0000 |
studentYes | 0.5237 |
#stargazer::stargazer(M5.or, coef=list(M5.or), title='被錄取與否的模型',
# type=ifelse(knitr::is_latex_output(),"latex","html"),
#label=knitr::opts_current$get("label"))
allcoef <-data.frame(balance=rep(mean(Default$balance),2),
income=rep(mean(Default$income),2),
student=as.factor(c("Yes", "No")))
knitr::kable(allcoef, format='simple')
balance | income | student |
---|---|---|
835.4 | 33517 | Yes |
835.4 | 33517 | No |
M5 <-glm(default ~ balance+income+student, data=Default, family=binomial(logit))
allcoef$pred <-predict(M5, newdata=allcoef, type="response")
allcoef<-cbind(allcoef, allcoef$pred)
allcoef
## balance income student pred allcoef$pred
## 1 835.4 33517 Yes 0.001329 0.001329
## 2 835.4 33517 No 0.002534 0.002534
M5 <-glm(default ~ balance+income+student, data=Default, family=binomial(logit))
M6 <-glm(default ~ balance+income+student, data=Default, family=binomial(probit))
stargazer::stargazer(M5, M6, align=TRUE, title='Logit, Probit模型的估計比較',
type=ifelse(knitr::is_latex_output(),"latex","html"), label=knitr::opts_current$get("label"))
Dependent variable: | ||
default | ||
logistic | probit | |
(1) | (2) | |
balance | 0.006*** | 0.003*** |
(0.0002) | (0.0001) | |
income | 0.00000 | 0.00000 |
(0.00001) | (0.00000) | |
studentYes | -0.647*** | -0.296** |
(0.236) | (0.119) | |
Constant | -10.870*** | -5.475*** |
(0.492) | (0.238) | |
Observations | 10,000 | 10,000 |
Log Likelihood | -785.800 | -791.600 |
Akaike Inf. Crit. | 1,580.000 | 1,591.000 |
Note: | p<0.1; p<0.05; p<0.01 |
Logistic迴歸模型的係數以及標準誤大約是Probit模型的兩倍。
最後,比較OLS、Logit、Probit迴歸模型的估計結果:
MOLS <- lm(default ~ balance+student, data = Default, family ='gaussian')
logM <- glm(default ~ balance+student, data = Default,
family = binomial('logit'))
probitM <- glm(default ~ balance+student, data = Default,
family = binomial('probit'))
stargazer::stargazer(MOLS, logM, probitM, title='比較OLS、Logit、Probit迴歸模型',
type=ifelse(knitr::is_latex_output(),"latex","html"),
label=knitr::opts_current$get("label"))
##
## <table style="text-align:center"><caption>(\#tab:OLSLogit)<strong>比較OLS、Logit、Probit迴歸模型</strong></caption>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3"><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="3" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td colspan="3">default</td></tr>
## <tr><td style="text-align:left"></td><td><em>OLS</em></td><td><em>logistic</em></td><td><em>probit</em></td></tr>
## <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">balance</td><td>0.0001<sup>***</sup></td><td>0.006<sup>***</sup></td><td>0.003<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.00000)</td><td>(0.0002)</td><td>(0.0001)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">studentYes</td><td>-0.015<sup>***</sup></td><td>-0.715<sup>***</sup></td><td>-0.343<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.004)</td><td>(0.148)</td><td>(0.074)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>-0.073<sup>***</sup></td><td>-10.750<sup>***</sup></td><td>-5.392<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.003)</td><td>(0.369)</td><td>(0.173)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>10,000</td><td>10,000</td><td>10,000</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.124</td><td></td><td></td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.124</td><td></td><td></td></tr>
## <tr><td style="text-align:left">Log Likelihood</td><td></td><td>-785.800</td><td>-791.700</td></tr>
## <tr><td style="text-align:left">Akaike Inf. Crit.</td><td></td><td>1,578.000</td><td>1,589.000</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>0.168 (df = 9997)</td><td></td><td></td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>707.100<sup>***</sup> (df = 2; 9997)</td><td></td><td></td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="3" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
如果模型裡的自變數是二元的,Allison (1999) 建議分成兩個模型分析,然後檢查兩個模型之間的係數比值。 見Allison, Paul D. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, 28(2), 186–208. https://doi.org/10.1177/0049124199028002003
如果其中有一群觀察值的係數總是大於另一群觀察值,必須要小心樣本結構是否有問題。Wald-test statistics測試值的計算方式如下:
\[\frac{(\beta_{1,\text{Model}\hspace{.15em}1}-\beta_{1,\text{Model}\hspace{.15em}2})^2}{(s.e.[\beta_{1,\text{Model}\hspace{.15em}1}])^2+(s.e.[\beta_{2,\text{Model}\hspace{.15em}2}])^2}\]
library(kableExtra)
DT <-data.frame(admission=c(rep(1,10),rep(0,10)),
gender=c(rep(1,7), rep(0,3),rep(1,3),rep(0,7)))
knitr::kable(DT, "latex", caption = "Admission and Gender")
library(kableExtra)
DT <-data.frame(Surv=c(rep(1,11),rep(0,22)),
Ag=c(rep(1,9), rep(0,2),rep(1,8),rep(0,14)))
DT<-DT %>%
kable("latex", caption = "Survive and Age")
library(haven); library(dplyr); library(janitor)
#read data
dt <- read_stata("https://grodri.github.io/datasets/mus14data.dta")
## 最後更新日期 05/31/2024