Problem Set 1

Your colleague either commutes by train or by the bus. 20 days of the month, she takes the train and the remaining 10 days she takes the bus. If she takes the train, she reaches work on time with a probability of 0.9. If she takes the bus, she frequently gets stuck in tra???c and reaches work on time with a probability of 0.5. Given that she was on time today, what is the probability that she took the bus to work today?

We know the following probabilities:

Given that she was on time today, what is the probability that she took the bus to work today?

Using Bayes’ Rule:

\[ p(bus|on\_time) = \frac{p(bus) * p(on\_time|bus)} { p(bus) * p(on\_time|bus) + p(train) * p(on\_time|train)} \]

\[ p(bus|on\_time) = \frac{.3333 * 0.5} { (.3333 * 0.5) + (.6666 * 0.9)} \]

\[ p(bus|on\_time) = 0.2173913 \]

Given that our colleague was on time today, the probability that she took the bus to work today is 0.2173913.

Problem Set 2

In the Grade Network that we looked at in the notes, what happens to the probability of Difficulty of Course when you present the evidence that the received recommendation letter was good? In addition, now present the evidence that both SAT scores were good and the letter of recommendation was good, What is the probability of the Difficulty of Course now? You should use the gRain package in R to build your network and perform these calculations. You may need to install RBGL package from BioConductor in R to get gRain working. See http://www.bioconductor.org/packages/release/bioc/html/RBGL.html for instructions on RBGL. Please submit your assignment as an R markdown document.

Reference for the gRain package: https://cran.r-project.org/web/packages/gRain/vignettes/gRain-intro.pdf

Installation

The packages graph, RBGL and Rgraphviz are not on CRAN but on bioconductor.

To install these packages, execute

Then install the packages from CRAN in the usual way:

  • install.packages(“gRain”, dependencies=TRUE)

Using gRain

In the Grade Network, we have 5 random variables (D,I,G,S,L) - which correspond to Difficulty, Intelligence, Grade, SAT Score, Letter of Recommendation

This graph represents a Bayesian network and factors the joint probability function:

\[p(d,i,g,s,l) = p(d) p(i) p(g|i,d) p(s|i) p(l|g)\]

suppressWarnings(suppressMessages(library(gRain)))

# levels
no_yes <- c("no", "yes")
low_high <- c("low","high")
 
##
## Conditional Probabilities
##
 
# difficulty
difficulty <- cptable(~difficulty, values=c(0.3, 0.7), levels = no_yes )
 
# intelligence
intelligence <- cptable(~intelligence, values=c(0.8, 0.2), levels=low_high)
 
 # sat given intelligence
sat.intelligence <- cptable(~sat|intelligence, values=c(0.9, 0.1, 0.2, 0.8), levels=low_high)
 
grade.intelligence_difficulty <- 
     cptable(~grade|intelligence:difficulty,values=c(0.6, 0.4, 0.01, 0.99, 0.8, 0.2, 0.1, 0.9 ),levels=low_high)


letter.grade <- cptable(~letter|grade,values=c(0.9, 0.1, 0.05, 0.95),levels=low_high)

plist <- 
    compileCPT(list(difficulty, intelligence, sat.intelligence, grade.intelligence_difficulty, letter.grade))
 
plist
## CPTspec with probabilities:
##  P( difficulty )
##  P( intelligence )
##  P( sat | intelligence )
##  P( grade | intelligence difficulty )
##  P( letter | grade )

Let’s look at the Grade Network graph:

network <- grain(plist)
plot(network)

Check the conditional probabilities in the Bayes Network:

Difficulty

## difficulty
##  no yes 
## 0.3 0.7

Intelligence

## intelligence
##  low high 
##  0.8  0.2

SAT

##       intelligence
## sat    low high
##   low  0.9  0.2
##   high 0.1  0.8

Grade

## , , difficulty = no
## 
##       intelligence
## grade  low high
##   low  0.6 0.01
##   high 0.4 0.99
## 
## , , difficulty = yes
## 
##       intelligence
## grade  low high
##   low  0.8  0.1
##   high 0.2  0.9

Letter

##       grade
## letter low high
##   low  0.9 0.05
##   high 0.1 0.95

Let’s query all the nodes in the network:

(querygrain(network, nodes=c("difficulty", "intelligence", "sat", "grade", "letter"), type="marginal"))
## $difficulty
## difficulty
##  no yes 
## 0.3 0.7 
## 
## $intelligence
## intelligence
##  low high 
##  0.8  0.2 
## 
## $grade
## grade
##    low   high 
## 0.6066 0.3934 
## 
## $sat
## sat
##  low high 
## 0.76 0.24 
## 
## $letter
## letter
##     low    high 
## 0.56561 0.43439

What happens to the probability of Difficulty of Course when you present the evidence that the received recommendation letter was good?

bn1 <- setEvidence(network, "letter", "high")

(querygrain(bn1, "difficulty", type="marginal"))
## $difficulty
## difficulty
##        no       yes 
## 0.3731439 0.6268561

In addition, now present the evidence that both SAT scores were good and the letter of recommendation was good, what is the probability of the Difficulty of Course now?

bn2 <- setEvidence(network,  c("sat", "letter"), c("high", "high"))

(querygrain(bn2, "difficulty"))
## $difficulty
## difficulty
##        no       yes 
## 0.3323478 0.6676522