1. Problem set 1

Your colleague either commutes by train or by the bus. 20 days of the month, she takes the train and the remaining 10 days she takes the bus. If she takes the train, she reaches work on time with a probability of \(0.9\). If she takes the bus, she frequently gets stuck in traffic and reaches work on time with a probability of \(0.5\). Given that she was on time today, what is the probability that she took the bus to work today?

\[P\left( T \right) =\frac { 20 }{ 20+10 }, \quad P\left( B \right) =\frac { 10 }{ 20+10 } =1-P\left( T \right)\]

\[P\left( { OT }|{ T } \right) =0.9, \quad P\left( { OT }|{ B } \right) =0.5\]

p_T <- 20 / (20 + 10)
p_B <- 1 - p_T
p_OTT <- 0.9
p_OTB <- 0.5

\[P(B|OT)=\frac { P\left( { OT }|B \right) P\left( B \right) }{ P\left( { OT }|{ T } \right) P\left( T \right) +P\left( { OT }|B \right) P\left( B \right) }\]

(p_BOT <- (p_OTB * p_B) / (p_OTT * p_T + p_OTB * p_B))

## [1] 0.2173913

\[P(B|OT)=\frac { 0.5\left( \frac { 10 }{ 20+10 } \right) }{ 0.9\left( \frac { 20 }{ 20+10 } \right) +0.5\left( \frac { 10 }{ 20+10 } \right) } = 0.2173913\]

2. Problem set 2

In the Grade Network that we looked at in the notes:

We have 5 random variables. \((D,I,G,S,L)\) – which correspond to Diffculty, Intelligence, Grade, SAT Score, Letter of Recommendation. If the student gets a good grade in that course, the student has a higher probability of receiving a good recommendation letter. Likewise, if the student in Intelligent, they have a higher chance of receiving a good SAT score. In general, higher the intelligence, the better their course grade and more difficult the course, the harder it is to get a good grade. This information can be encoded in the figure shown below. In addition, the conditional probability tables are shown next to the corresponding nodes. This graph represents a Bayesian network and factors the joint probability function \(p(d,i,g,s,l) = p(d)p(i)p(g|i,d)p(s|i)p(l|g)\).

What happens to the probability of Difficulty of Course when you present the evidence that the received recommendation letter was good? In addition, now present the evidence that both SAT scores were good and the letter of recommendation was good, What is the probability of the Difficulty of Course now? You should use the gRain package in R to build your network and perform these calculations. You need to install the RBGL package from BioConductor in R to get gRain working.

Recreate Grade Network

library(gRain)

Create conditional probability table (CPT) of the form \(p(v|pa(v))\) with cptable.

no_yes <- c("No","Yes")
low_high <- c("Low", "High")
D <- cptable(~Difficulty, values = c(0.3, 0.7), levels = no_yes)
I <- cptable(~Intelligence, values = c(0.8, 0.2), levels = low_high)
G.DI <- cptable(~Grade | Difficulty:Intelligence, 
                values = c(0.6, 0.4, 0.8, 0.2, 0.01, 0.99, 0.1, 0.9), levels = low_high)
S.I <- cptable(~SAT | Intelligence, 
               values = c(0.9, 0.1, 0.2, 0.8), levels = low_high)
L.G <- cptable(~Letter | Grade, 
               values = c(0.9, 0.1, 0.05, 0.95), levels = low_high)
(list(D, I, G.DI, S.I, L.G))

## [[1]]
## {v,pa(v)} : chr "Difficulty"
##     <NA>
## No   0.3
## Yes  0.7
## 
## [[2]]
## {v,pa(v)} : chr "Intelligence"
##      <NA>
## Low   0.8
## High  0.2
## 
## [[3]]
## {v,pa(v)} : chr [1:3] "Grade" "Difficulty" "Intelligence"
##      <NA> <NA> <NA> <NA>
## Low   0.6  0.8 0.01  0.1
## High  0.4  0.2 0.99  0.9
## 
## [[4]]
## {v,pa(v)} : chr [1:2] "SAT" "Intelligence"
##      <NA> <NA>
## Low   0.9  0.2
## High  0.1  0.8
## 
## [[5]]
## {v,pa(v)} : chr [1:2] "Letter" "Grade"
##      <NA> <NA>
## Low   0.9 0.05
## High  0.1 0.95

Compile conditional probability tables / cliques potentials as a preprocessing step for creating a graphical independence network with compileCPT.

(cptlist <- compileCPT(list(D, I, G.DI, S.I, L.G)))

## CPTspec with probabilities:
##  P( Difficulty )
##  P( Intelligence )
##  P( Grade | Difficulty Intelligence )
##  P( SAT | Intelligence )
##  P( Letter | Grade )

Build a graphical independence network with grain.

(Network <- grain(cptlist))

## Independence network: Compiled: FALSE Propagated: FALSE 
##   Nodes: chr [1:5] "Difficulty" "Intelligence" "Grade" "SAT" ...

Plot the graphical independence network with plot.grain and iplot.grain.

par(mfrow=c(1,2))
plot.grain(Network)
iplot.grain(Network)

5. Query the independence network, i.e. obtain the conditional distribution of a set of variables given finding (evidence) on other variables with setEvidence and querygrain.

Difficulty of Course given evidence that recommendation letter was good.

Evidence_L <- setEvidence(Network, evidence = list(Letter="High"))
querygrain(Evidence_L, nodes="Difficulty")

## $Difficulty
## Difficulty
##        No       Yes 
## 0.3731439 0.6268561

Difficulty of Course given evidence that SAT scores were good and recommendation letter was good.

Evidence_SL <- setEvidence(Network, evidence = list(Letter="High", SAT="High"))
querygrain(Evidence_SL, nodes="Difficulty")

## $Difficulty
## Difficulty
##        No       Yes 
## 0.3323478 0.6676522

References

ⁱ https://cran.r-project.org/web/packages/gRain/gRain.pdf

DATA 605 Assignment 8

Jose Zuniga

1. Problem set 1

2. Problem set 2

Recreate Grade Network

References