Your colleague either commutes by train or by the bus. 20 days of the month, she takes the train and the remaining 10 days she takes the bus. If she takes the train, she reaches work on time with a probability of \(0.9\). If she takes the bus, she frequently gets stuck in traffic and reaches work on time with a probability of \(0.5\). Given that she was on time today, what is the probability that she took the bus to work today?
\[P\left( T \right) =\frac { 20 }{ 20+10 }, \quad P\left( B \right) =\frac { 10 }{ 20+10 } =1-P\left( T \right)\]
\[P\left( { OT }|{ T } \right) =0.9, \quad P\left( { OT }|{ B } \right) =0.5\]
p_T <- 20 / (20 + 10)
p_B <- 1 - p_T
p_OTT <- 0.9
p_OTB <- 0.5
\[P(B|OT)=\frac { P\left( { OT }|B \right) P\left( B \right) }{ P\left( { OT }|{ T } \right) P\left( T \right) +P\left( { OT }|B \right) P\left( B \right) }\]
(p_BOT <- (p_OTB * p_B) / (p_OTT * p_T + p_OTB * p_B))
## [1] 0.2173913
\[P(B|OT)=\frac { 0.5\left( \frac { 10 }{ 20+10 } \right) }{ 0.9\left( \frac { 20 }{ 20+10 } \right) +0.5\left( \frac { 10 }{ 20+10 } \right) } = 0.2173913\]
In the Grade Network that we looked at in the notes:
We have 5 random variables. \((D,I,G,S,L)\) – which correspond to Diffculty, Intelligence, Grade, SAT Score, Letter of Recommendation. If the student gets a good grade in that course, the student has a higher probability of receiving a good recommendation letter. Likewise, if the student in Intelligent, they have a higher chance of receiving a good SAT score. In general, higher the intelligence, the better their course grade and more difficult the course, the harder it is to get a good grade. This information can be encoded in the figure shown below. In addition, the conditional probability tables are shown next to the corresponding nodes. This graph represents a Bayesian network and factors the joint probability function \(p(d,i,g,s,l) = p(d)p(i)p(g|i,d)p(s|i)p(l|g)\).
What happens to the probability of Difficulty of Course when you present the evidence that the received recommendation letter was good? In addition, now present the evidence that both SAT scores were good and the letter of recommendation was good, What is the probability of the Difficulty of Course now? You should use the gRain package in R to build your network and perform these calculations. You need to install the RBGL package from BioConductor in R to get gRain working.
library(gRain)
no_yes <- c("No","Yes")
low_high <- c("Low", "High")
D <- cptable(~Difficulty, values = c(0.3, 0.7), levels = no_yes)
I <- cptable(~Intelligence, values = c(0.8, 0.2), levels = low_high)
G.DI <- cptable(~Grade | Difficulty:Intelligence,
values = c(0.6, 0.4, 0.8, 0.2, 0.01, 0.99, 0.1, 0.9), levels = low_high)
S.I <- cptable(~SAT | Intelligence,
values = c(0.9, 0.1, 0.2, 0.8), levels = low_high)
L.G <- cptable(~Letter | Grade,
values = c(0.9, 0.1, 0.05, 0.95), levels = low_high)
(list(D, I, G.DI, S.I, L.G))
## [[1]]
## {v,pa(v)} : chr "Difficulty"
## <NA>
## No 0.3
## Yes 0.7
##
## [[2]]
## {v,pa(v)} : chr "Intelligence"
## <NA>
## Low 0.8
## High 0.2
##
## [[3]]
## {v,pa(v)} : chr [1:3] "Grade" "Difficulty" "Intelligence"
## <NA> <NA> <NA> <NA>
## Low 0.6 0.8 0.01 0.1
## High 0.4 0.2 0.99 0.9
##
## [[4]]
## {v,pa(v)} : chr [1:2] "SAT" "Intelligence"
## <NA> <NA>
## Low 0.9 0.2
## High 0.1 0.8
##
## [[5]]
## {v,pa(v)} : chr [1:2] "Letter" "Grade"
## <NA> <NA>
## Low 0.9 0.05
## High 0.1 0.95
(cptlist <- compileCPT(list(D, I, G.DI, S.I, L.G)))
## CPTspec with probabilities:
## P( Difficulty )
## P( Intelligence )
## P( Grade | Difficulty Intelligence )
## P( SAT | Intelligence )
## P( Letter | Grade )
(Network <- grain(cptlist))
## Independence network: Compiled: FALSE Propagated: FALSE
## Nodes: chr [1:5] "Difficulty" "Intelligence" "Grade" "SAT" ...
par(mfrow=c(1,2))
plot.grain(Network)
iplot.grain(Network)
5. Query the independence network, i.e. obtain the conditional distribution of a set of variables given finding (evidence) on other variables with setEvidence and querygrain.
Evidence_L <- setEvidence(Network, evidence = list(Letter="High"))
querygrain(Evidence_L, nodes="Difficulty")
## $Difficulty
## Difficulty
## No Yes
## 0.3731439 0.6268561
Evidence_SL <- setEvidence(Network, evidence = list(Letter="High", SAT="High"))
querygrain(Evidence_SL, nodes="Difficulty")
## $Difficulty
## Difficulty
## No Yes
## 0.3323478 0.6676522