貝氏統計是利用已知某事件A成立的條件下、另一個事件B成立的條件機率,來推論當B成立的條件下,A成立的機率。
假設政大裡面有三家餐廳,在開幕之前,請幾位專家根據價格、菜色、地點預測三家餐廳受歡迎的相對程度,結果得到以下的百分比:
Restaurants = c('Bagle','Pasta','Taiwanese food')
pct = c(0.70, 0.14, 0.16)
pcttable <- data.frame(Restaurants, pct) %>%
add_row(Restaurants = "sum", pct = sum(pct))
knitr::kable(pcttable, digits = 4, align = 'l', caption = '餐廳投票結果', format = "html", booktabs = TRUE) %>%
kable_styling(latex_options = "scale_down")
Restaurants | pct |
---|---|
Bagle | 0.70 |
Pasta | 0.14 |
Taiwanese food | 0.16 |
sum | 1.00 |
預測結果是Bagle可能最受歡迎,Pasta跟Taiwanese food很接近。
預測結果公佈後,某家餐廳決定舉辦開幕促銷,另一家餐廳決定加強宣傳,專家根據促銷的手法,又來預測一次,給三家餐廳新的受歡迎機率。
也就是說促銷前有三家餐廳的人氣,促銷的幅度影響我們對於餐廳人氣的判斷,換句話說,受歡迎度是果,促銷是因,這兩個變數組成一個模型,稱為最大概似模型。
根據貝氏定理,受歡迎程度乘以最大概似模型等於事後分佈。
但是我們必須把事後分佈變成0到1的機率,所以事後分佈除以另一個事件的機率,把事後分佈轉換為\(0\sim 1\)。
Restaurants = c('Bagle','Pasta','Taiwanese food')
prior = c(0.70, 0.14, 0.16)
likelihood = c(0.276, 0.314, 0.272)
product = prior * likelihood
posterior = product / sum(product)
pcttable <- data.frame(Restaurants, prior, likelihood, product, posterior) %>%
add_row(Restaurants = "sum", prior = sum(prior),
likelihood = NA,
product = sum(product),
posterior = sum(posterior))
knitr::kable(pcttable, digits = 4, align = 'l', caption = '餐廳投票結果與主觀機率', format = "html", booktabs = TRUE) %>%
kable_styling(latex_options = "scale_down")
Restaurants | prior | likelihood | product | posterior |
---|---|---|---|---|
Bagle | 0.70 | 0.276 | 0.1932 | 0.6883 |
Pasta | 0.14 | 0.314 | 0.0440 | 0.1566 |
Taiwanese food | 0.16 | 0.272 | 0.0435 | 0.1551 |
sum | 1.00 | NA | 0.2807 | 1.0000 |
可以看出Taiwanese food的事後機率低於pasta,但是先驗機率高於pasta,代表促銷的資訊更新了一開始的主觀機率。
如果有新的相關資訊,例如學生每天午餐的預算,或者學生吃完後給的評價,可以再更新目前的事後機率。
這一套過程就是貝氏統計。我們將逐漸帶入連續變數的先驗分佈,說明如何得出事後分佈。
過去我們介紹參數與統計,參數是母體的特徵,例如平均數、離散程度。統計指的是樣本的特徵,例如樣本平均數、樣本標準差等等。
我們常常統計某個事件的發生次數,例如擲硬幣得到正面與反面的次數、國家考試每年的報考人數以及錄取率等等。某個事件在樣本空間中發生的相對次數稱為機率。如果我們重複無數次擲硬幣,或者舉行無數次的國家考試,應該會發現擲出正面的比例會接近50%,國家考試的錄取率可能會接近某個數字,因為錄取率如果變高、報考人數變多,使得錄取率變低,然後報考人數又變低,使得錄取率又變高。
貝氏統計有兩個特性,第一個是主觀機率,也就是假設參數是會隨著資料而變動,而不是經由無數次實驗之後找到一個固定不動的數字。所以這裡衍生出第二個特性,也就是新的資料可以更新一開始的主觀機率,也就是沒有條件的主觀機率會受限於後來觀察到的資料與模型。
更進一步說,主觀機率是一個不確定性(uncertainty)的概念,我們其實不百分之一百肯定擲硬幣無數次之後會不會是50%出現正面,頂多是95%相信。我們也不知道錄取率會不會接近某個數字,例如5%,頂多是60%相信。而且,大部份時候,無法無數次地重複實驗某個樣本空間。如果把機率理解成一種「分佈」,我們可以說50%正面的機率是95%。不確定性本身也是一個可以加以模型的對象。
貝氏統計的步驟:
\[ \text{posterior}\propto \text{likelihood}\times\text{product} \]
\[\begin{align} p(X+Y=4|X=3) & =\frac{p(X+Y=4,X=3)}{p(X=3)}\\ & = \frac{p(\{3,1\})}{p(\{3,1\},\{3,2\},\{3,3\},\{3,4\},\{3,5\},\{3,6\})}\\ & = \frac{1/36}{6/36} \\ & = \frac{1}{6} \end{align}\]
換句話說,我們因為知道其中有一顆骰子點數是3的資訊,更新我們本來以為是\(\frac{1}{36}\)的機率,變成\(\frac{1}{6}\)。
這裡要注意的是 Law of Total Probability。假設我們有兩個事件A, B,其中B又可以分成好幾個事件,\(P(B_{1})+P(B_{2})+,\ldots ,+P(B_{n})=S\)(樣本空間),那麼以下的關係成立:
\[ P(A|B)=P(A|B_{1})\cdot P(B_{1})+P(A|B_{2})\cdot P(B_{2})+,\ldots ,+P(A|B_{n})\cdot P(B_{n}) \]
\[\begin{equation} P(B|A) = \frac{P(B\cap A )}{P(A)}=\frac{P(A|B)\cdot P(B)}{P(A)} \tag{2.1} \end{equation}\]
\[ P(B\cap A)=P(A|B)P(B) \]
\[ P(B'\cap A)=P(A|B')P(B') \]
\[ P(A|B)\cdot P(B)+P(A|B')\cdot P(B')=P(A) \]
\[\begin{equation} \frac{P(A|B)\cdot P(B)}{P(A)} = \frac{P(A|B)P(B)}{P(A|B)P(B)+P(A|B')P(B')} \tag{2.2} \end{equation}\]
\[ \Pr(B|A) \propto \Pr(A|B)\Pr(B) \]
之前在介紹全機率定理提到,假設有一台X光機,當病人有結核病(Tuberculosis,又稱TB)時,有90%的機率會正確驗出有10%的機會沒有驗出來。如果病人並沒有結核病,有99%的機率正確地顯示病人沒有結核病,但是有1%的機會誤判有結核病。假設全台灣只有0.01%的人有結核病,隨機抽一位民眾去檢驗,請問機器顯示有結核病的機率多高?
首先用TB代表有結核病的事件,TB’代表沒有結核病的事件,從題目可知:
\[ P(TB) = 0.0001\\ P(TB')=0.9999 \]
\[\begin{align} P(TB|D) & = \frac{P(TB, D)}{P(D)}\\ & = \frac{P(D|TB)P(TB)}{P(D)} \\ & = \frac{P(D|TB)P(TB)}{P(D|TB)P(TB)+P(D|TB')P(TB')} \\ &= \frac{0.9\cdot 0.0001}{0.9\cdot 0.0001+0.01\cdot 0.9999} \\ & = 0.0089 \\ & = 0.89\% \end{align}\]
得到的結果是接近1%,這是因為患有結核病的機率本來就只有0.01%,如果被驗出有結核病,雖然提高了患有結核病的機率,但是也只有1%。
在這個例子中,患有結核病、X光機的準確度等等機率是客觀的事實,但是有很多時候我們可能主觀想像,例如想像有3成的候選人在賄選,7成的候選人乾淨參選。賄選而被檢舉的機率是10%,沒賄選而被檢舉為賄選則是3%。那麼被檢舉賄選的情況下,真的有賄選的機率是多高?候選人在賄選的機率屬於主觀想像,賄選而被檢舉的機率也是要靠想像。但是我們可以透過貝氏定理,得到比較客觀的估計。
假設樣本空間是一個班級,其中20位是大三學生,30位大二學生。在大二學生中坐公車通勤的比例為50%,大三學生通勤的比例為30%,請問在公車上面看到一位班上同學,他是大三學生的機率多高?
由上面的例子可知,條件機率、邊際機率是貝氏統計的重要部分。以下用三個事件的條件機率當做例子來說明。
這裡我們要介紹給定一個條件時,兩個事件互相獨立的概念。也就是:
\[ A_i \mbox{$\perp\!\!\!\perp$}B_i \mid C_i \]
\[ \text{Pr}(C\mid A)=\frac{\Pr(A\cap C)}{\Pr(A)}=\frac{3/36}{18/36}=\frac{1}{6} \]
而且\(\Pr(C)=\frac{1}{6}\),所以我們發現A這個條件是否成立不會影響C事件的機率,所以A與C互相獨立。
同樣的B與C互相獨立。
在兩顆骰子的總和是7的事件成立情形下,B是第二顆骰子顯示奇數點的機率是\(\frac{3}{6}=\frac{1}{2}\)。也可以計算如下:
\[\begin{align} \text{Pr}(B\mid C)&=\frac{\text{Pr}(B, C)}{\text{Pr}(C)}\\ & =\frac{3/36}{1/6} \\ & = \frac{1}{2} \\ & = \text{Pr}(B) \end{align}\]
由以上可知,\(\text{Pr}(C\mid A)=\text{Pr}(C)\),而且\(\text{Pr}(B\mid C)=\text{Pr}(B)\)。當A與C互相獨立,B與C也互相獨立,那麼A與B互相獨立。所以在兩顆骰子的總和是7的事件成立情形下,A與B是互相獨立的事件。 換句話說,在兩顆骰子的總和是7的事件成立時,知道第一顆骰子是偶數時,沒辦法讓我們知道第二顆骰子的點數,反之亦然。
三個事件的關係我們可以寫成:
\[\begin{align} \Pr(A\mid B, C) & = \frac{\Pr(A)\times \Pr(B,C\mid A)}{\Pr(B, C)}\\ & =\frac{1/2\times 1/2\times 1/12}{1/12} \\ & = \frac{1}{4} \\ & =\Pr(A)\times \Pr(B) \end{align}\]
\[\begin{align} \Pr(C\mid A, B) & = \frac{\Pr(C)\times \Pr(A, B\mid C)}{\Pr(A, B)}\\ & =\frac{\Pr(C)\times 1/\Pr(C)\times \Pr(A,B,C)}{\Pr(A, B)} \\ & =\frac{\Pr(A,B,C)}{\Pr(A, B)}\\ & = \frac{1}{6} \end{align}\]
FLVoters.csv
),包含以下的變數:Name | Description |
---|---|
surname |
Surname |
county |
County census id |
VTD |
Voting district census id |
age |
Age |
gender |
Gender |
race |
Self-reported race |
names.csv
),包含以下的變數:Name D | escription |
---|---|
surname |
Frequently occurring surname |
counts |
Number of people who have this surname |
pctwhite |
Percent of non-Hispanic whites who have this surname |
pctblack |
Percent of non-Hispanic blacks who have this surname |
pcthispanic |
Percent of Hispanics who have this surname |
pctapi |
Percent of non-Hispanic Asian and Pacific Islanders who have this surname |
pctothers |
Percent of others who have this surname |
FLCensusVTD.csv
,提供不同族群佔該選區的比率的資訊:Name | Description |
---|---|
county |
county census id of voting district |
VTD |
voting district census id |
total.pop |
total population of voting district |
white |
proportion of non-Hispanic whites in voting district |
black |
proportion of non-Hispanic blacks in voting district |
api |
proportion of non-Hispanic Asian and Pacific Islanders in voting district |
hispanic |
proportion of voters of Hispanic origin in voting district |
others |
proportion of the other racial groups in voting district |
以上的資料顯示\(\Pr(R_{i}=r, G_{i}=g)\)以及\(\Pr(R_{i}=r|S_{i}=s)\)。換句話說,我們知道族群與居住地區的聯合機率,以及姓氏與居住地區的條件機率。
我們關心的則是\(\Pr(R_i = r \mid S_i = s, G_i = g)\)的條件機率,也就是在姓氏與居住地的條件下,屬於特定族群的條件機率。但是我們不知道這三者的聯合機率,也就是\(\Pr(R\cap S\cap G)\)
根據貝氏定理,居住地與族群的關係可寫成如下:
\[\Pr(G_i = g\mid R_i = r) = \frac{\Pr(R_i = r\mid G_i = g)\Pr(G_i = g)}{\sum_{g^\prime} \Pr(R_i = r\mid G_i = g^\prime)\Pr(G_i = g^\prime)}.\] - 如果加上姓氏,假設 \(G_i \mbox{$\perp\!\!\!\perp$}S_i \mid R_i\),三個事件的關係寫成如下:
\[\Pr(R_i = r \mid S_i = s, G_i = g) = \frac{\Pr(G_i = g\mid R_i = r)\Pr(R_i =r \mid S_i =s)}{\sum_{r^\prime} \Pr(G_i =g \mid R_i = r^\prime)\Pr(R_i =r^\prime \mid S_i = s)}\]
因為G與S在R成立的條件下互相獨立,所以R在S,G的條件機率\(\Pr(R\mid S,G)=\Pr(R\mid S)\)。
另一方面,S, G以R為條件的機率可以寫成\(\Pr(G, S\mid R)=\Pr(G\mid R)\)。
對應的資料:
FLCensus.csv
:\(\Pr(R_i = r\mid G_i = g)\)FLCensus.csv
:\(\Pr(G_i = g)\)names.csv
:\(\Pr(R_i =r \mid S_i =s)\)先用FLCensus.csv
計算\(\Pr(G_i = g\mid R_i = r)=\)
\[
\frac{r_{i}}{population_{i}}\times\frac{population_{i}}{total \hspace{0.5em} population_{i}}
\]
再計算\(\Pr(R_i = r\mid G_i = g)\times\Pr(G_{i}=g)=\) \[ \frac{r_{i}}{total\hspace{.4em} population\hspace{.4em} of\hspace{.4em} r_{i}} \]
再計算\(\Pr(Ri = r | Si = s)\),因為有現成資料,所以除以100就可以:
# Read data
library(tidyverse)
setwd('~/Dropbox/EastAsia2024/')
census <- read_csv("data/gov2017data/FLCensusVTD.csv",
show_col_types = FALSE)
voters <- read_csv("data/gov2017data/FLVoters.csv",
show_col_types = FALSE)
names <- read_csv("data/gov2017data/names.csv",
show_col_types = FALSE)
# 1. Pr(Ri = r | Gi = g): `white`:`others` columns from census
#census <- census %>%
#select(white:others)
# 2. Pr(Gi = g): make new column called `total.prop`
census <- census |>
mutate(total.prop = total.pop/sum(census$total.pop))
# Apply Bayes rule to compute Pr(Gi = g | Ri = r)
census <- census |>
mutate(across(white:others, ~ .x*total.prop)) |> # compute numerator: Pr(Ri = r | Gi = g) Pr(Gi = g)
mutate(across(white:others, ~ .x/sum(.x), .names = "p_geo_{.col}")) |> # divide by denominator, and name it as p_geo_white, ...
dplyr::select(-c(white:others, total.pop, total.prop))
# 3. Pr(Ri = r | Si = s): divide `pctwhite`:`pctothers` by 100 and rename it to p_white_name, ...
names <- names |>
mutate(across(pctwhite:pctothers, ~ .x/100)) |>
rename_with(~ gsub("pct", "p_name_", .x), starts_with("pct"))
# Merge data.frames and generate prediction
voters <- voters |>
left_join(names, by = "surname") |>
left_join(census, by = c("county", "VTD")) |>
mutate(white = p_name_white * p_geo_white,
black = p_name_black * p_geo_black,
api = p_name_api * p_geo_api,
hispanic = p_name_hispanic * p_geo_hispanic,
others = p_name_others * p_geo_others)
#select cases & variables
tmp.10 <- voters[c(1:20), c('surname', 'white', 'black', 'api', 'hispanic')]
#Long table
tmp.10 <- reshape2::melt(tmp.10)
#set up orders
sel_order <-
tmp.10 %>%
filter(variable == "white") %>%
arrange(desc(value)) %>%
mutate(labels = factor(surname))
tmp.data <- tmp.10 %>%
mutate(surname = factor(surname, levels = sel_order$labels, ordered = TRUE)) %>%
mutate(race = variable)
#setwd('~/Dropbox/EastAsia2024/')
nameandrace <- here::here('Fig','nameandrace.png')
knitr::include_graphics(nameandrace)
Figure 2.1: 姓氏與種族的機率
\[\begin{equation} p(\theta\mid Y) \propto \binom{n}{Y}\theta^k(1-\theta)^{n-Y}\times\frac{\Gamma(a_{0}+b_{0})}{\Gamma(a_{0})\Gamma(b_{0})}\theta^{a_{0}-1}(1-\theta)^{b_{0}-1} \\ \propto \text{Beta}(Y+a_{0},n-Y+b_{0}) \tag{3.1} \end{equation}\]
在方程式(3.1)中有兩個部分:
\[ \binom{n}{Y}\theta^k(1-\theta)^{n-Y} \]
\[ \frac{\Gamma(a_{0}+b_{0})}{\Gamma(a_{0})\Gamma(b_{0})}\theta^{a_{0}-1}(1-\theta)^{b_{0}-1} \]
先驗機率乘以最大概似模型等於後驗分佈或者事後分佈(posterior)。
\(\Gamma\) 分佈是對於正整數n的分佈:
\[ \Gamma(n)=(n-1)!=1\times 2\times \cdots\times (n-1) \]
\[ n\Gamma(n)=\Gamma(n+1) \]
\[ p(\theta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha+1}(1-\theta)^{\beta-1} \]
Beta分佈有兩個參數:\(\alpha\), \(\beta\)。
因為Beta分佈隨著參數改變而有不同的分布形態,所以經常用在貝氏統計的先驗機率分佈。
看起來Beta分佈與二項分布的公式有一點相似。二元分佈的公式為:
\[ p(x) = \binom{n}{x}\theta^x(1-\theta)^{n-x} \]
二項分布的平均值是\(n\cdot p\),變異數是\(n\cdot p\cdot (1-p)\)。
其中x是成功事件的次數,而n是實驗的次數或者一次實驗的個案數,\(\binom{n}{x}\)表示\(\frac{n!}{x!(n-x)!}\)
根據貝氏定理,
\[ p(\theta\mid y)\propto p(y\mid \theta)p(\theta) \]
\[\begin{equation} p(\theta\mid y) \propto \binom{n}{y}\theta^{y}(1-\theta)^{n-y}\times \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha+1}(1-\theta)^{\beta-1} \tag{3.2} \end{equation}\]
\[ p(\theta\mid y) \propto \theta^{\alpha-1+y}(1-\theta)^{\beta-1+N-y} \]
\[ p(\theta\mid y) \sim \text{Beta}(\alpha+y,\hspace{.2em}\beta+N-y) \]
\[ \frac{\alpha}{\alpha+\beta} \]
\[\begin{align} \mathbb{E}(\theta\mid Y) & = \frac{Y+a_{0}}{a_{0}+b_{0}+n}\\ & = \frac{a_{0}+b_{0}}{a_{0}+b_{0}+n}\times \frac{a_{0}}{a_{0}+b_{0}}+\frac{n}{a_{0}+b_{0}+n}\times \frac{Y}{n}\\ & \rightarrow \mathbb{E}(Y) \end{align}\]
這個結論是當樣本數增加時,二項分佈的貝氏統計的平均值會等於樣本的期待值,也就是\(n\cdot p\)。先驗分佈的影響會隨著樣本數增加而下降。
另一個重點是Beta分佈的參數從\((a_{0}, b_{0})\)變成\((a_{0}+Y,\hspace{.3em}b_{0}+n-Y)\),代表資料可以進入模型,不斷疊代,一直到分佈不再改變才會停止,這就是貝氏統計用新的資訊更新預期分佈的精神。
後驗分佈的平均值經過一定的轉換之後,
\[\begin{align} \mathbb{E}(\theta\mid Y) & =\frac{a_{0}+b_{0}}{a_{0}+b_{0}+n}\times\frac{a_{0}}{a_{0}+b_{0}}+\frac{n}{a_{0}+b_{0}+n}\times\frac{Y}{n} \end{align}\]
因為\(\frac{a_{0}}{a_{0}+b_{0}}\)是先驗分佈的平均值,而\(\frac{Y}{n}\)是樣本平均值,所以後驗分佈的平均值是先驗分佈的平均值與樣本平均值的加權平均值。
後驗分佈的變異數是:
\[ \text{Var}(θ∣Y)=\frac{(Y+a_{0})(b_{0}+n-Y)}{(a_{0}+b_{0}+n)^2×(a_{0}+b_{0}+n+1)} \]
先驗預測分佈(prior predictive distribution)指的是\(\Pr(Y|X)=\int \Pr(Y|X,\theta)P(\theta)d\theta\)。
先驗分佈是\(\Pr(\theta)\),是對於參數來自於某種分佈的預期,例如Beta分佈、常態分佈、二項分佈。
後驗分佈則是\(\Pr(\theta\mid Y)\),或者也就是先驗分佈乘以來自於特定分佈的觀察資料的最大概似模型,\(Y\mid \theta\),例如二項分佈的模型。後驗分佈是我們對於參數分佈的估計,但是會受限於現有的資料。
先驗預測分佈可以想成假如隨機從先驗分佈\(P(\theta)\)抽樣,我們會得到的預測值。也就是說還沒有放入最大概似模型之前我們先
可以用均等分佈取代Beta分佈當做先驗分佈,得到的先驗預測分佈(prior predictive distribution)為:
\[\begin{align} \Pr(y=k) & = \int_{0}^{1}\Pr(y=k\mid \theta)d\theta \\ & = \int_{0}^{1}\binom{n}{k}\theta^k(1-\theta)^{n-k}d\theta \\ & = \binom{n}{k}\frac{\Gamma(k+1)\Gamma(n-k+1)}{\Gamma(n+2)} \\ & = \frac{n!}{k!(n-k)!}\frac{k!(n-k)!}{(n+1)!} \\ & = \frac{1}{n+1} \end{align}\]
\[ X\mid \mu, \sigma^2\sim \mathcal{N}(\mu, \sigma_{0}^2)=(2\pi\sigma_{0}^2)^{-\frac{1}{2}}\text{exp}\left[-\frac{1}{2\sigma_{0}^2}(X-\mu)^2\right] \]
\[ -\infty<\mu<\infty \]
\[ \mu\mid m,s^2\sim\mathcal{N}(m,s^2)=(2\pi s^2)^{-\frac{1}{2}}\text{exp}\left[-\frac{1}{2s^2}(\mu-m)^2\right] \]
\[\begin{align} p(\mu\mid x) & \propto p(x\mid\mu)p(\mu)\\ &\prod_{i=1}^{n}\propto \text{exp}\left[-\frac{1}{2\sigma_{0}^2}(x_{i}-\mu)^2\right]\text{exp}\left[-\frac{1}{2s^2}(\mu-m)^2\right] \\ & = \text{exp}\left[-\frac{1}{2}\left(\frac{1}{\sigma_{0}^2}\sigma_{i=1}^{n}(x_{i}-\mu)^2+\frac{1}{s^2}(\mu-m)^2\right)\right] \\ & = \text{exp}\left[-\frac{1}{2}\left(\frac{1}{\sigma_{0}^2}\sum_{i=1}^{n}(x_{i}^2-2x_{i}\mu+\mu^2)+\frac{1}{s^2}(\mu^2-2\mu m+m^2)\right)\right] \\ & = \text{exp}\left[-\frac{1}{2}\frac{1}{\sigma_{0}^2 s^2}\left(s^2 \sum_{i=1}^{n}x_{i}^2-2s^2\mu n\bar{x}+n\mu^2 s^2+\sigma_{0}^2 \mu^2-2\sigma_{0}^2\mu m+\sigma_{0}^2 m^2\right) \right] \\ & = \text{exp}\left[-\frac{1}{2}\frac{1}{\sigma_{0}^2 s^2}\left(\mu^2 (\sigma_{0}^2+ns^2)-2\mu(m\sigma_{0}^2+s^2 n\bar{x})+(m^2\sigma_{0}^2+s^2\sum_{i=1}^{n}x_{i}^2)\right)\right] \end{align}\]
\[\begin{align} & \propto \text{exp}\left[-\frac{1}{2}\left(\mu^2\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)-2\mu\left(\frac{m}{s^2}+\frac{n\bar{x}}{\sigma_{0}^2}\right)+k\right)\right] \\ & = \text{exp}\left[-\frac{1}{2}\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)\Bigg(\mu^2\frac{\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)}{\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)}-2\mu\frac{\left(\frac{m}{s^2}+\frac{n\bar{x}}{\sigma_{0}^2}\right)}{\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)} +k \Bigg) \right] \\ & \propto \Bigg[-\frac{1}{2}\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right) \Bigg(\mu-\frac{\left(\frac{m}{\sigma^2}+\frac{n\bar{x}}{\sigma_{0}^2}\right)}{\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)}\Bigg) \Bigg] \tag{3.3} \end{align}\]
\[ \frac{\left(\frac{m}{\sigma^2}+\frac{n\bar{x}}{\sigma_{0}^2}\right)}{\left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)} \]
\[ \left(\frac{1}{s^2}+\frac{n}{\sigma_{0}^2}\right)^{-1} \]
因為\(\tilde{u}\)的大小取決於\(\bar{x}\),所以樣本平均數可以用來估計事後分佈的平均數。
事後分佈的準確度(precision)取決於資料的離散程度(\(\frac{n}{\sigma_{0}^2}\))以及先驗分佈的精確度(\(\frac{1}{s^2}\))。所以以下的等式表示事後分佈的準確度為:
\[ \frac{1}{\sigma_{\mu}^2}= \frac{1}{s^2}+\frac{n}{\sigma_{0}^2} \]
\[ \lim _{n\rightarrow\infty}\tilde{u}=\bar{x} \]
\[ \lim _{n\rightarrow\infty}\hat{\sigma}_{u}^2=\frac{\sigma_{0}^2}{n} \]
library(tidyverse)
library(janitor)
library(kableExtra)
library(viridis)
library(scales)
# prior
mu0 = 0.46
tau0 = 0.3
# data
sigma = 1.25
n = 30
ybar = 0.53
# posterior variance
tau_n = 1 / sqrt(1 / tau0 ^ 2 + n / sigma ^ 2 )
# posterior mean
mu_n = mu0 * (1 / tau0 ^ 2) / (1 / tau_n ^ 2) + ybar * (n / sigma ^ 2) / (1 / tau_n ^ 2)
df = data.frame(c("Precision", "SD", "Mean"),
c(1 / tau0 ^ 2, tau0, mu0),
c(n / sigma ^ 2, sigma / sqrt(n), ybar),
c(1 / tau_n ^ 2, tau_n, mu_n))
df %>%
kbl(digits = 3,
col.names = c("","Prior", "Data (Sample Mean)", "Posterior")) %>%
kable_styling()
Prior | Data (Sample Mean) | Posterior | |
---|---|---|---|
Precision | 11.11 | 19.200 | 30.311 |
SD | 0.30 | 0.228 | 0.182 |
Mean | 0.46 | 0.530 | 0.504 |
bayes_col = c("#EA11BB", "#9000EE", "#9EE007")
names(bayes_col) = c("prior", "likelihood", "posterior")
bayes_lty = c("dashed", "dotted", "solid")
names(bayes_lty) = c("prior", "likelihood", "posterior")
# Plotting
# scaled likelihood, depends on data: n, ybar
likelihood_scaled <- function(theta) {
likelihood <- function(theta) {
dnorm(x = ybar, mean = theta, sd = sigma / sqrt(n))
}
scaling_constant <- integrate(likelihood, lower = 0, upper = 1)[[1]]
likelihood(theta) / scaling_constant
}
# Plot
ggplot(data.frame(x = c(-0.5, 1.5)),
aes(x = x)) +
# prior
stat_function(fun = dnorm,
args = list(mean = mu0,
sd = tau0),
lty = bayes_lty["prior"],
linewidth = 1,
aes(color = "prior", linetype = "prior")) +
# (scaled) likelihood
stat_function(fun = likelihood_scaled,
lty = bayes_lty["likelihood"],
linewidth = 1,
aes(color = "likelihood", linetype = "likeihood")) +
# posterior
stat_function(fun = dnorm,
args = list(mean = mu_n,
sd = tau_n),
lty = bayes_lty["posterior"],
linewidth = 1,
aes(color = "posterior", linetype = "posterior")) +
# Define color and add a legend
scale_color_manual(name = "",
breaks = c("prior", "likelihood", "posterior"),
values = bayes_col[c("prior", "likelihood", "posterior")]) +
scale_linetype_manual(name = "",
breaks = c("prior", "likelihood", "posterior"),
values = bayes_lty[c("prior", "likelihood", "posterior")]) +
labs(x = "theta",
y = "") +
theme_bw()
Figure 3.1: 吉尼係數的事前與事後密度分佈
qnorm(c(0.50), mu_n, tau_n)
[1] 0.5043
#https://bookdown.org/kevin_davisross/stat415-handouts/normal-normal.html
\[ f(x\mid \nu)=\frac{2^{-\nu/2}}{\Gamma(\nu/2)}x^{-\frac{\nu+2}{2}}e^{-\frac{1}{2x}}, x>0 \]
# Required library
library(extraDistr)
# Parameters
df <- 4 # Degrees of freedom
# Generate random samples
n <- 1000
samples <- rinvgamma(n, df)
# Plot histogram of random samples
hist(samples, freq = FALSE, main = "Histogram of Inverse Chi-Squared Samples",
xlab = "Value", ylab = "Density", col = "#21CCAA",
ylim = c(0, 4))
# Plot PDF
curve(dinvgamma(x, df), add = TRUE, col = "#CA1021", lwd = 2,
xlab = "Value", ylab = "Density", main = "PDF of Inverse Chi-Squared Distribution")
Figure 3.2: 反卡方分佈圖
\[ Y_{i}\sim \mathcal{N}(\mu,\sigma^2)\hspace{.4em}\text{for}\hspace{.3em}i=1,2,\ldots,n\hspace{.3em}\text{with known}\hspace{.3em}\mu \]
\[\begin{align} p(\sigma^2\mid Y,\mu) & \propto \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\text{exp}\Big\{-\frac{1}{2\sigma^2}(Y_{i}-\mu)^2\Big\}\times\frac{(s^2\nu_{0}/2)^{\nu_{0}/2}}{\Gamma(\nu_{0}/2)}(\sigma^2)^{-(\nu_{0}/2+1)}\text{exp}\left(-\frac{s_{0}^2\nu_{0}}{2\sigma^2}\right)\\ & \propto \text{Inv-}\chi^2\left\{n+\nu_{0},\frac{1}{n+\nu_{0}}\left(ns_{n}^2+\nu_{0}s_{0}^2\right)\right\} \end{align}\]
最後更新時間: 2024-06-14 10:09:17