I. Bayes Theorem

Please explain Bayes Theorem in your own words, and give an example. Less than 10 sentences.

Definition:

The concept behind the Bayes’ rule is that it takes is a blend of two types of probability: 1) classical probability, which is based on an objective approach of seeing equally spaced (i.e., \(\frac{total\ possible\ successes}{total\ possible\ outcomes}\)), and 2) relative frequency probability, which calculates the chance of success based on empirically observed (and also, objective) data. Particularly, in Bayesian statistics, the “true” probability of an individual case of success is calculated using existing, empirical data of the general population; it applies the law of conditional probability, in the way that the probability ratio is not only computed, but also updated based on prior, empirical data.

Because the probability reflects the patterns of the sample population, this approach is often used for recommendation systems, hierarchical models, and pollster/election variability. For example, insurance companies gauge individuals’ risk and premium rates based on not only the equal probability of an accident occurring, but more so on the individuals’ personal histories as well as the average rate of incidents for populations of similar age ranges, frequency of incidents that occur relative to the geographic location (e.g., higher car insurance rates in Boston versus a small town in Wyoming), etc.

Formula:

Write out the formula. Pick up on how to to type equations in R Markdown using Latex terminology here.

\[P(A \mid B)\ = \frac{ P(B \mid A)\ * P(A)}{P(B)}\]

II. Open Stats Guided Practice 3.43

Bayes’ Theorem can yield surprising results. Take a look at Open Stats textbook in the Dropbox folder Guided Practice 3.43 (p. 107) and attempt to solve this using Bayes’ formula. Interpret the results.  Then use the attached code and solve via a tree diagram in R.  You will need to change the initial parameters to the appropriate values.  Attach your final graph and your code to your submission.  If you can create the tree with an alternative package in R, please feel free to do so.  The final answer will be the same, of course.  For full credit, be sure to comment your code in the chunks and also describe the setup in text before every chunk. Note: for newer versions of R, you will need to install ‘BiocManager.’

QUESTION: Guided Practice 3.43: Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.

Solved using Bayes’ formula:

\[P(A \mid B)\ = \frac{ P(B \mid A)\ * P(A)}{P(B)}\]

\[P(Sporting\ Event \mid Garage\ Full)\ = \frac{(0.70)*(0.20)}{((0.70)*(0.20))\ + \ ((0.05)*(0.45)\ +\ (0.25)(0.35))}\]

\[P(Sporting\ Event \mid Garage\ Full)\ = 0.56 \]

library("BiocManager")
library("Rgraphviz")
## Loading required package: graph
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: grid
P_A  <- 0.35
P_S <- 0.20
P_N <- 0.45
P_F_A <- 0.25
P_F_S <- 0.70
P_F_N <- 0.05

Prob_of_full_sport <- (P_S * P_F_S) / (P_A * P_F_A + P_S * P_F_S + P_N * P_F_N)
Prob_of_full_sport
## [1] 0.56

Solved using a tree diagram:

Code by Luo

Explanation of problem

A = academic event
S = sporting event
N = no event
F = event that garage is full
P(A) = 0.35
P(S) = 0.20
P(N) = 0.45
P(F|A) = 0.25
P(F|S) = 0.70
P(F|N) = 0.05
a1 <- .35
a2 <- .20
a3 <- .45
fa1 <- .25
fa2 <- .7
fa3 <- .05

notfa1 <- 1 - fa1
notfa2 <- 1 - fa2
notfa3 <- 1 - fa3
aANDfa1 <- a1 * fa1
bANDfa2 <- a2 * fa2
cANDfa3 <- a3 * fa3

node1     <-  "Campus Parking"
node2     <-  "Academic"
node3     <-  "Sporting"
node4     <-  "None"
node5     <-  "Aca Full"
node6     <-  "Aca Avail"
node7     <-  "Sport Full"
node8     <-  "Sport Avail"
node9     <-  "None Full"
node10    <-  "None Avail"
nodeNames <- c(node1, node2, node3, node4, node5, node6, node7, node8, node9, node10)

rEG   <- new("graphNEL", 
             nodes = nodeNames, 
             edgemode="directed"
             )


rEG <- addEdge (nodeNames[1], nodeNames[2], rEG, 1)
rEG <- addEdge (nodeNames[1], nodeNames[3], rEG, 1)
rEG <- addEdge (nodeNames[1], nodeNames[4], rEG, 1)
rEG <- addEdge (nodeNames[2], nodeNames[5], rEG, 1)
rEG <- addEdge (nodeNames[2], nodeNames[6], rEG, 1)
rEG <- addEdge (nodeNames[3], nodeNames[7], rEG, 1)
rEG <- addEdge (nodeNames[3], nodeNames[8], rEG, 1)
rEG <- addEdge (nodeNames[4], nodeNames[9], rEG, 1)
rEG <- addEdge (nodeNames[4], nodeNames[10], rEG, 10)
eAttrs <- list()
q <- edgeNames(rEG)




eAttrs$label <- c(toString(a1), toString(a2),
                  toString(a3), toString(fa1),
                  toString(notfa1), toString(fa2),
                  toString(notfa2), toString(fa3),
                  toString(notfa3)
                  )

names(eAttrs$label) <- c( q[1], q[2], q[3], q[4], q[5], q[6],  q[7], q[8], q[9])
edgeAttrs <- eAttrs
attributes <- list(node  = list(label    = "foo", 
                              fillcolor = "pink", 
                              fontsize  = "15"
                              ),
                   edge  = list(color   = "blue"),
                   graph = list(rankdir = "LR")
                   )

plot (rEG, edgeAttrs = eAttrs,attrs=attributes)
text(578,410, aANDfa1, cex = .6)
text(570,320,notfa1,cex=.6)
text(578,230, bANDfa2, cex = .6)
text(570,170,notfa2,cex=.6)
text(578,95, cANDfa3, cex = .6)
text(570,30,notfa3,cex=.6)
text(160,55, paste('P(B):', a2), cex = 1.1)
text(160,35, paste('P(F):', aANDfa1+bANDfa2+cANDfa3), cex = 1.1)
text(160,15, paste('P(B|F):', fa2*a2/(aANDfa1+bANDfa2+cANDfa3)), cex = 1.1)