CUNY DATA605 - Assignment# 8

Problem Set# 1

Your colleague either commutes by train or by the bus. 20 days of the month, she takes the train and the remaining 10 days she takes the bus. If she takes the train, she reaches work on time with a probability of 0.9. If she takes the bus, she frequently gets stuck in traffic and reaches work on time with a probability of 0.5. Given that she was on time today, what is the probability that she took the bus to work today?

This problem is an application of Baye’s theorem that can be stated as follows:

\(P(H/E)\quad =\quad \frac { P(H).P(E/H) }{ P(E) }\), where H is an hypothesis and E is Evidence.

In our scenario above, we have the following;
\(P({ H }_{ 1 })\) = colleague took the train
\(P({ H }_{ 2 })\) = Our colleague took the bus
E = Colleague arrives on time at work

From the problem, we are trying to find the probability of “Our colleague took the bus” (i.e. \({ H }_{ 2 }\))
given that she arrived on time (i.e. E). Hence we are looking for \(P({ H }_{ 2 }/E)\). Applying Baye’s theorem,
this can be written as;
\(P({ H }_{ 2 }/E)\quad =\quad \frac { P({ H }_{ 2 }).P(E/{ H }_{ 2 }) }{ P(E) }\)

Let us now determine the following probability: \(P({ H }_{ 2 })\),\(P(E/H_{ 2 })\), and \(P(E)\)

Our colleague takes the bus 10 days out of 30, hence \(P({ H }_{ 2 })\) = 10/30 = 1/3. (Similarly \(P({ H }_{ 1 })\) = 20/30 = 2/3).
The probability of our colleague arriving on time given that she takes the bus has been given to us 0.5. Hence \(P(E/H_{ 2 })\) = 1/2
Finally, the probability that our colleague is on time at work (\(P(E)\)), can be determined by sum of the possibilities for our colleague to arrive on time.
i.e. Either travelling by train or by bus. Hence,

\(P(E)\quad =\quad P({ H }_{ 1 }).P(E/{ H }_{ 1 })\quad +\quad P({ H }_{ 2 }).P(E/{ H }_{ 2 })\)

by substituing we get;
\(P(E)\quad =\quad \frac { 2 }{ 3 } .\frac { 9 }{ 10 } \quad +\quad \frac { 1 }{ 3 } .\frac { 1 }{ 2 } =\quad \frac { 3 }{ 5 } \quad +\quad \frac { 1 }{ 6 } \quad =\quad \frac { 18\quad +\quad 5 }{ 30 } \quad =\quad \frac { 23 }{ 30 }\)

By substituing all the values back into our formula, we get:

\(P({ H }_{ 2 }/E)\quad =\quad \frac { \frac { 1 }{ 3 } .\frac { 1 }{ 2 } }{ \frac { 23 }{ 30 } } \quad =\quad \frac { \frac { 1 }{ 6 } }{ \frac { 23 }{ 30 } } \quad =\quad \frac { 1 }{ 6 } .\frac { 30 }{ 23 } \quad =\quad \frac { 5 }{ 23 } \quad =\quad 0.2174\)

Problem Set# 2

In the Grade Network that we looked at in the notes, what happens to the probability of
Difficulty of Course when you present the evidence that the received recommendation letter was good?
In addition, now present the evidence that both SAT scores were good and
the letter of recommendation was good, What is the probability of the Difficulty of Course now?

In order to answer these questions, we will construct a “Grade” Network using the package gRain.
We are given the following graph and table of probilities.

Grade Network

Difficulty
Yes	No
0.7	0.3

Intelligence
High	Low
0.2	0.8

Grade
Diff/Int	High	Low
D=Y, I=H	0.9	0.1
D=Y, I=L	0.2	0.8
D=N, I=H	0.99	0.01
D=N, I=L	0.4	0.6

Letter
Grade	High	Low
High	0.95	0.05
Low	0.1	0.9

SAT
Intelligence	High	Low
High	0.8	0.2
Low	0.1	0.9

We will now set-up the Graphical Network using gRain package in R.

library(gRain)

## Loading required package: gRbase

# Initialize vectors to used as levels (Yes/No) and (High/Low)  
yn <- c("Yes", "No")
hl <- c("High", "Low")  

# Set-up each table with appropriate values and levell
d <- cptable(~ difficulty, values = c(0.7, 0.3), levels = yn)

i <- cptable(~ intelligence, values = c(0.2, 0.8), levels = hl)

g <- cptable(~ grade | intelligence : difficulty, values = c(0.9, 0.1, 0.2, 0.8, 0.99, 0.01, 0.4, 0.6), levels = hl)

l <- cptable(~ letter | grade, values = c(0.95, 0.05, 0.1, 0.9), levels=hl)

s <- cptable(~ SAT | intelligence, values = c(0.8, 0.2, 0.1, 0.9), levels = hl)  

# Compile Model
glist <- compileCPT(list(d, i, g, l, s))
gnet <- grain(glist)

# Display Values
glist

## CPTspec with probabilities:
##  P( difficulty )
##  P( intelligence )
##  P( grade | intelligence difficulty )
##  P( letter | grade )
##  P( SAT | intelligence )

gnet

## Independence network: Compiled: FALSE Propagated: FALSE 
##   Nodes: chr [1:5] "difficulty" "intelligence" "grade" "letter" ...

glist$difficulty

## difficulty
## Yes  No 
## 0.7 0.3 
## attr(,"class")
## [1] "parray" "array"

glist$intelligence

## intelligence
## High  Low 
##  0.2  0.8 
## attr(,"class")
## [1] "parray" "array"

glist$grade

## , , difficulty = Yes
## 
##       intelligence
## grade  High Low
##   High  0.9 0.2
##   Low   0.1 0.8
## 
## , , difficulty = No
## 
##       intelligence
## grade  High Low
##   High 0.99 0.4
##   Low  0.01 0.6
## 
## attr(,"class")
## [1] "parray" "array"

glist$letter

##       grade
## letter High Low
##   High 0.95 0.1
##   Low  0.05 0.9
## attr(,"class")
## [1] "parray" "array"

glist$SAT

##       intelligence
## SAT    High Low
##   High  0.8 0.1
##   Low   0.2 0.9
## attr(,"class")
## [1] "parray" "array"

# plot Model
iplot(gnet)

# Querying Model
querygrain(gnet, nodes=c("difficulty", "intelligence", "grade", "letter", "SAT"))

## $difficulty
## difficulty
## Yes  No 
## 0.7 0.3 
## 
## $intelligence
## intelligence
## High  Low 
##  0.2  0.8 
## 
## $grade
## grade
##   High    Low 
## 0.3934 0.6066 
## 
## $letter
## letter
##    High     Low 
## 0.43439 0.56561 
## 
## $SAT
## SAT
## High  Low 
## 0.24 0.76

# Setting Event and Querying model again.  To answer first question, we have evidence that letter received was good (state = High)
gnet.ev_l <- setEvidence(gnet, nodes = "letter", states = "High")

querygrain(gnet.ev_l, nodes = "difficulty")

## $difficulty
## difficulty
##       Yes        No 
## 0.6268561 0.3731439

# Setting 2nd Event and Querying model again.  To answer 2nd question, we have evidence that both SAT scores were good and letter was good

gnet.ev_ls <- setEvidence(gnet, nodes = c( "SAT", "letter"), states = c("High", "High"))
querygrain(gnet.ev_ls, nodes = "difficulty")

## $difficulty
## difficulty
##       Yes        No 
## 0.6676522 0.3323478

CUNY DATA605 - Assignment# 8

Valerie Briot

October 20, 2016

Problem Set# 1

Problem Set# 2