Your colleague either commutes by train or by the bus. 20 days of the month, she takes the train and the remaining 10 days she takes the bus. If she takes the train, she reaches work on time with a probability of 0.9. If she takes the bus, she frequently gets stuck in traffic and reaches work on time with a probability of 0.5. Given that she was on time today, what is the probability that she took the bus to work today?
This problem is an application of Baye’s theorem that can be stated as follows:
\(P(H/E)\quad =\quad \frac { P(H).P(E/H) }{ P(E) }\), where H is an hypothesis and E is Evidence.
In our scenario above, we have the following;
\(P({ H }_{ 1 })\) = colleague took the train
\(P({ H }_{ 2 })\) = Our colleague took the bus
E = Colleague arrives on time at work
From the problem, we are trying to find the probability of “Our colleague took the bus” (i.e. \({ H }_{ 2 }\))
given that she arrived on time (i.e. E). Hence we are looking for \(P({ H }_{ 2 }/E)\). Applying Baye’s theorem,
this can be written as;
\(P({ H }_{ 2 }/E)\quad =\quad \frac { P({ H }_{ 2 }).P(E/{ H }_{ 2 }) }{ P(E) }\)
Let us now determine the following probability: \(P({ H }_{ 2 })\),\(P(E/H_{ 2 })\), and \(P(E)\)
Our colleague takes the bus 10 days out of 30, hence \(P({ H }_{ 2 })\) = 10/30 = 1/3. (Similarly \(P({ H }_{ 1 })\) = 20/30 = 2/3).
The probability of our colleague arriving on time given that she takes the bus has been given to us 0.5. Hence \(P(E/H_{ 2 })\) = 1/2
Finally, the probability that our colleague is on time at work (\(P(E)\)), can be determined by sum of the possibilities for our colleague to arrive on time.
i.e. Either travelling by train or by bus. Hence,
\(P(E)\quad =\quad P({ H }_{ 1 }).P(E/{ H }_{ 1 })\quad +\quad P({ H }_{ 2 }).P(E/{ H }_{ 2 })\)
by substituing we get;
\(P(E)\quad =\quad \frac { 2 }{ 3 } .\frac { 9 }{ 10 } \quad +\quad \frac { 1 }{ 3 } .\frac { 1 }{ 2 } =\quad \frac { 3 }{ 5 } \quad +\quad \frac { 1 }{ 6 } \quad =\quad \frac { 18\quad +\quad 5 }{ 30 } \quad =\quad \frac { 23 }{ 30 }\)
By substituing all the values back into our formula, we get:
\(P({ H }_{ 2 }/E)\quad =\quad \frac { \frac { 1 }{ 3 } .\frac { 1 }{ 2 } }{ \frac { 23 }{ 30 } } \quad =\quad \frac { \frac { 1 }{ 6 } }{ \frac { 23 }{ 30 } } \quad =\quad \frac { 1 }{ 6 } .\frac { 30 }{ 23 } \quad =\quad \frac { 5 }{ 23 } \quad =\quad 0.2174\)
In the Grade Network that we looked at in the notes, what happens to the probability of
Difficulty of Course when you present the evidence that the received recommendation letter was good?
In addition, now present the evidence that both SAT scores were good and
the letter of recommendation was good, What is the probability of the Difficulty of Course now?
In order to answer these questions, we will construct a “Grade” Network using the package gRain.
We are given the following graph and table of probilities.
Grade Network
Yes | No |
---|---|
0.7 | 0.3 |
High | Low |
---|---|
0.2 | 0.8 |
Diff/Int | High | Low |
---|---|---|
D=Y, I=H | 0.9 | 0.1 |
D=Y, I=L | 0.2 | 0.8 |
D=N, I=H | 0.99 | 0.01 |
D=N, I=L | 0.4 | 0.6 |
Grade | High | Low |
---|---|---|
High | 0.95 | 0.05 |
Low | 0.1 | 0.9 |
Intelligence | High | Low |
---|---|---|
High | 0.8 | 0.2 |
Low | 0.1 | 0.9 |
We will now set-up the Graphical Network using gRain package in R.
library(gRain)
## Loading required package: gRbase
# Initialize vectors to used as levels (Yes/No) and (High/Low)
yn <- c("Yes", "No")
hl <- c("High", "Low")
# Set-up each table with appropriate values and levell
d <- cptable(~ difficulty, values = c(0.7, 0.3), levels = yn)
i <- cptable(~ intelligence, values = c(0.2, 0.8), levels = hl)
g <- cptable(~ grade | intelligence : difficulty, values = c(0.9, 0.1, 0.2, 0.8, 0.99, 0.01, 0.4, 0.6), levels = hl)
l <- cptable(~ letter | grade, values = c(0.95, 0.05, 0.1, 0.9), levels=hl)
s <- cptable(~ SAT | intelligence, values = c(0.8, 0.2, 0.1, 0.9), levels = hl)
# Compile Model
glist <- compileCPT(list(d, i, g, l, s))
gnet <- grain(glist)
# Display Values
glist
## CPTspec with probabilities:
## P( difficulty )
## P( intelligence )
## P( grade | intelligence difficulty )
## P( letter | grade )
## P( SAT | intelligence )
gnet
## Independence network: Compiled: FALSE Propagated: FALSE
## Nodes: chr [1:5] "difficulty" "intelligence" "grade" "letter" ...
glist$difficulty
## difficulty
## Yes No
## 0.7 0.3
## attr(,"class")
## [1] "parray" "array"
glist$intelligence
## intelligence
## High Low
## 0.2 0.8
## attr(,"class")
## [1] "parray" "array"
glist$grade
## , , difficulty = Yes
##
## intelligence
## grade High Low
## High 0.9 0.2
## Low 0.1 0.8
##
## , , difficulty = No
##
## intelligence
## grade High Low
## High 0.99 0.4
## Low 0.01 0.6
##
## attr(,"class")
## [1] "parray" "array"
glist$letter
## grade
## letter High Low
## High 0.95 0.1
## Low 0.05 0.9
## attr(,"class")
## [1] "parray" "array"
glist$SAT
## intelligence
## SAT High Low
## High 0.8 0.1
## Low 0.2 0.9
## attr(,"class")
## [1] "parray" "array"
# plot Model
iplot(gnet)
# Querying Model
querygrain(gnet, nodes=c("difficulty", "intelligence", "grade", "letter", "SAT"))
## $difficulty
## difficulty
## Yes No
## 0.7 0.3
##
## $intelligence
## intelligence
## High Low
## 0.2 0.8
##
## $grade
## grade
## High Low
## 0.3934 0.6066
##
## $letter
## letter
## High Low
## 0.43439 0.56561
##
## $SAT
## SAT
## High Low
## 0.24 0.76
# Setting Event and Querying model again. To answer first question, we have evidence that letter received was good (state = High)
gnet.ev_l <- setEvidence(gnet, nodes = "letter", states = "High")
querygrain(gnet.ev_l, nodes = "difficulty")
## $difficulty
## difficulty
## Yes No
## 0.6268561 0.3731439
# Setting 2nd Event and Querying model again. To answer 2nd question, we have evidence that both SAT scores were good and letter was good
gnet.ev_ls <- setEvidence(gnet, nodes = c( "SAT", "letter"), states = c("High", "High"))
querygrain(gnet.ev_ls, nodes = "difficulty")
## $difficulty
## difficulty
## Yes No
## 0.6676522 0.3323478