DATA 605 - Assignment #8

PROBLEM SET 1

Your colleague either commutes by train or by the bus. 20 days of the month, she takes the train and the remaining 10 days she takes the bus. If she takes the train, she reaches work on time with a probability of 0.9. If she takes the bus, she frequently gets stuck in traffic and reaches work on time with a probability of 0.5. Given that she was on time today, what is the probability that she took the bus to work today?

A tree diagram is set up to understand the probabilities better.

We are asked to find P(Bus | On time)? According to Bayes’ Theorm of inverting probabilities, we get:

\({P(Bus | On time)}\) = \(\frac{P(On time | Bus) * P(Bus)}{P(On time | Bus) * P(Bus) + P(On time | Train) * P(Train)}\)

# probability bus is taken
p_bus <- 10/30

# probability train is taken
p_train <- 20/30

# probability of on time if bus is taken
p_ontime_given_bus <- 0.5

# probability of on time if train is taken
p_ontime_given_train <- 0.9

# probability of taking a bus if arrived on time
p_bus_given_ontime <- (p_ontime_given_bus * p_bus)/((p_ontime_given_bus * p_bus) + (p_ontime_given_train * p_train))

p_bus_given_ontime

## [1] 0.2173913

\({P(Bus | On time)}\) = 21.74%

PROBLEM SET 2

Setting up Grade Network in R using gRain library.

library("gRain")

## Loading required package: gRbase

# setting up conditional probability tables with values given
ny <- c("N","Y")
lh <- c("L","H")

d <- cptable(~ difficulty, values = c(0.3, 0.7), levels = ny)
i <- cptable(~ intelligence, values = c(0.8, 0.2), levels = lh)
g.di <- cptable(~ grade + difficulty + intelligence, values = c(0.6, 0.4, 0.8, 0.2, 0.01, 0.99, 0.1, 0.9), levels = lh )
l.g <- cptable(~ letter + grade, values= c(0.9, 0.1, 0.05, 0.95), levels = lh)
s.i <- cptable(~  sat + intelligence, values = c(0.9, 0.1, 0.2, 0.8), levels = lh)

# create list of conditional probability tables
plist <- compileCPT(list(d,i,g.di,l.g,s.i))
plist$difficulty

## difficulty
##   N   Y 
## 0.3 0.7

plist$intelligence

## intelligence
##   L   H 
## 0.8 0.2

plist$grade

## , , intelligence = L
## 
##      difficulty
## grade   N   Y
##     L 0.6 0.8
##     H 0.4 0.2
## 
## , , intelligence = H
## 
##      difficulty
## grade    N   Y
##     L 0.01 0.1
##     H 0.99 0.9

plist$letter

##       grade
## letter   L    H
##      L 0.9 0.05
##      H 0.1 0.95

plist$sat

##    intelligence
## sat   L   H
##   L 0.9 0.2
##   H 0.1 0.8

# create network from list
network <- grain(plist)

(I) What happens to the probability of Difficulty of Course when you present the evidence that the received recommendation letter was good?

# Set findings and the query
network.find <- setFinding(network, nodes = "letter", state = "H")
diff.course <- querygrain(network.find, nodes = "difficulty")
diff.course

## $difficulty
## difficulty
##         N         Y 
## 0.3731439 0.6268561

The probability of difficulty of courses decreases by 7%.

(II) In addition, now present the evidence that both SAT scores were good and the letter of recommendation was good, What is the probability of the Difficulty of Course now?

# Set findings and the query
network.find <- setFinding(network, nodes = c("letter", "sat"), state = c("H","H"))
diff.course <- querygrain(network.find, nodes = "difficulty")
diff.course

## $difficulty
## difficulty
##         N         Y 
## 0.3323478 0.6676522

With good SAT scores, the probability of difficulty of course increases by 4% than just having a good recommendation letter.

DATA 605 - Assignment #8

Puneet Auluck

October 15, 2016

PROBLEM SET 1

PROBLEM SET 2

Setting up Grade Network in R using gRain library.