Some basic of probability terminologies to know:
P(A) : Probability of event A
P(A|B) : Probability of event A given event B has occurred
P(A’) : Probability of complement of event A, equal to (1-P(A)).
P(A ∩ B): Probability of both event A and B, the intersection of them, occurring.
Bayes’ theorem:
The theorem is based on the conditional probability formula which allows us to calculate the probability of one event given that another event has happened. Bayes theorem took this idea further, it is a formula that enables us to calculate the probability of an event, say P(A|B), without knowing the all elements which are required in the conditional probability formula. For example,
If we want to find out what’s the probability of Microsoft’s stock (MSFT) price falling, given that the Dow Jones index (DJIA) already fell. In conditional probability, we will calculate it in the following way:
\[ P(MSFT|DJIA) = \frac {P(MSFT ∩ DJIA)}{P(DJIA)} \]
P(MSFT ∩ DJIA) is the probability of both MSFT and DJIA falling in price occurring, based on Bayes’ theorem, it is the same as the probability that DJIA occurs given when MSFT occurs multiplied by the probability of MSFT falling in price occurring. So now we can write in the following form:
\[ P(MSFT|DJIA) = \frac {P(DJIA∩MSFT)*P(MSFT))}{P(DJIA)} \]
So what’s significance of these two equations being equal to each other? This is important that when we are calculating probabilities in real life, sometimes is easy to calculate P(A ∩ B) and sometimes is easier to calculate P(B ∩ A). By knowing that P(A ∩ B) and P(B ∩ A) are equivalent, it allows us to calculate the probability of events without knowing everything about event. And this is what Bayes theorem tells us.
Bayes theorem formula
Question:
Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.
Solving by hand
Step 1: List out all information given in probability notations
P(A) = Probability of academic event = 0.35
P(S) = Probability of sporting event = 0.2
P(N) = Probability of no event = 0.45
P(F|A) = Probability of a full garage given academic event = 0.25
P(F|S) = Probability of a full garage given sporting event = 0.7
P(F|N) = Probability of a full garage given no event = 0.05
We need to find P(S|F), probability of a sporting event given the garage is full
Step 2 : Apply Bayes’ theorem
\[ P(S|F) = \frac{P(F|S)*P(S)}{P(F)} \]
We know each of the following probabilities
P(F|S) = Probability of a full garage given sporting event = 0.7
P(S) = Probability of sporting event = 0.2
\[ P(F|S)*P(S) = 0.7*0.2 = 0.14 \]
Now, we need to find P(F)
P(F) can be calculated by \(P(A)P(F|A) + P(S)P(F|S) + P(N)P(F|N)\)
Now we can plug in the values \[ P(S|F) = \frac{P(F|S)*P(S)}{P(F)} = \frac{0.14}{0.25} = 0.56 \]
Conclusion:
Solving by code
First, let’s begin by constructing the Bayes’ theorem formula
# Define bayes'theorem formula
bayes_theorem <- function(pA, pB, pBA){
pAB <- (pBA * pA )/ pB
return(pAB)
}Now the formula is build, we will begin our calculation by pluging in the probabilities we obtained from the question
# Define probabilities
pSport <- 0.2
pFull_Sport <- 0.7
pFull <- 0.25
# Apply function built
bayes_theorem(pSport,
pFull,
pFull_Sport )
## [1] 0.56Now we successfully obtained the same answer as we got earlier from doing it by hand.
Tree diagram
#Package needed
library(BiocManager)
## Bioconductor version '3.16' is out-of-date; the current release version '3.17'
## is available with R version '4.3'; see https://bioconductor.org/install
BiocManager::install("Rgraphviz")
## Bioconductor version 3.16 (BiocManager 1.30.22), R 4.2.2 (2022-10-31)
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'Rgraphviz'
## Old packages: 'askpass', 'boot', 'bslib', 'cachem', 'class', 'codetools',
## 'cpp11', 'digest', 'dplyr', 'evaluate', 'fontawesome', 'foreign', 'fs',
## 'gtable', 'Hmisc', 'htmltools', 'jsonlite', 'KernSmooth', 'knitr',
## 'labeling', 'lattice', 'learnr', 'lubridate', 'markdown', 'MASS', 'Matrix',
## 'mgcv', 'minqa', 'nlme', 'nnet', 'openssl', 'pkgload', 'prettyunits',
## 'psych', 'purrr', 'RcppArmadillo', 'renv', 'rlang', 'rmarkdown', 'sass',
## 'spatial', 'survival', 'tinytex', 'vctrs', 'viridisLite', 'withr', 'xfun'
library("Rgraphviz")
## Loading required package: graph
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: grid
# Define terms and probabilities
a <- 0.35 # probability of academic event
s <- 0.20 # probability of sporting event
n <- 0.45 # probability of no event
fa <- 0.25 # probability of a full garage, given academic event occurs
fs <- 0.7 # probability of a full garage, given sporting event occurs
fn <- 0.05 # probability of a full garage, given no event occurs
# Complementary probabilities calculation
notB_givenA1 <- 1 - fa # probability of a not full garage, given academic event occurs
notB_givenA2 <- 1 - fs # probability of a not full garage, given sporting event occurs
notB_givenA3 <- 1 - fn # probability of a not full garage, given no event occurs
# Joint probabilities calculation
## An event occurs, garage is full, given the event
a_fa <- a * fa # P(a ∩ fa)
s_fs <- s * fs # P(s ∩ fs)
## No event occurs, garage is full, given the event
n_fn <- n * fn # P(n ∩ fn)
## An event occurs, but garage isn't full, given the event
a_notB_givenA1 <- a * notB_givenA1 #P(a ∩ notB_givenA)
s_notB_givenA2 <- s * notB_givenA2 #P(a ∩ notB_givenA2)
n_notB_givenA3 <- n * notB_givenA3 #P(a ∩ notB_givenA3)
# Check values
list(a_fa,
s_fs,
n_fn,
a_notB_givenA1,
s_notB_givenA2,
n_notB_givenA3)
## [[1]]
## [1] 0.0875
##
## [[2]]
## [1] 0.14
##
## [[3]]
## [1] 0.0225
##
## [[4]]
## [1] 0.2625
##
## [[5]]
## [1] 0.06
##
## [[6]]
## [1] 0.4275
#####
## Tree Construction
#####
# Lables on graph
node1 <- "P"
node2 <- "a"
node3 <- "s"
node4 <- "n"
node5 <- "fa"
node6 <- "notB_givenA1"
node7 <- "fs"
node8 <- "notB_givenA2"
node9 <- "fn"
node10 <- "notB_givenA3"
nodeNames <- c(node1, node2, node3, node4, node5, node6, node7, node8, node9, node10)
# Creating tree structure
rEG <- new("graphNEL",
nodes = nodeNames,
edgemode="directed"
)
# Connecting differnet nodes with lines / Creating branches of the tree
rEG <- addEdge (nodeNames[1], nodeNames[2], rEG, 1)
rEG <- addEdge (nodeNames[1], nodeNames[3], rEG, 1)
rEG <- addEdge (nodeNames[1], nodeNames[4], rEG, 1)
rEG <- addEdge (nodeNames[2], nodeNames[5], rEG, 1)
rEG <- addEdge (nodeNames[2], nodeNames[6], rEG, 1)
rEG <- addEdge (nodeNames[3], nodeNames[7], rEG, 1)
rEG <- addEdge (nodeNames[3], nodeNames[8], rEG, 1)
rEG <- addEdge (nodeNames[4], nodeNames[9], rEG, 1)
rEG <- addEdge (nodeNames[4], nodeNames[10], rEG, 10)
eAttrs <- list()
q <- edgeNames(rEG)
# PROBABILITY VALUES
## Add the probability values to the the branch lines
eAttrs$label <- c(toString(a),toString(s),
toString(n),toString(fn),
toString(fa),toString(fs),
toString(notB_givenA1),toString(notB_givenA2),
toString(notB_givenA3)
)
names(eAttrs$label) <- c( q[1],q[2],q[3],q[4],q[5],q[6],q[7],q[8],q[9] )
edgeAttrs <- eAttrs
# Setting colors
attributes <- list(node = list(label = "foo",
fillcolor = "lightgreen",
fontsize = "20"
),
edge = list(color = "darkgreen"),
graph = list(rankdir = "LR")
)
# Plotting graph
plot(rEG, edgeAttrs = eAttrs, attrs=attributes)
# Probability labels
## Add the probability values to the leaves of A&B, A&B', A'&B, A'&B'
text(80,50, paste("P(s):" ,s), cex = .9, col="darkgreen")
text(80,35, paste("P(a):" ,a), cex = .9,col="darkgreen")
text(80,20, paste("P(n):" ,n), cex = .9,col="darkgreen")
text(160,50, paste("P(f):" ,(a_fa + s_fs + n_fn)), cex = .9)
text(160,20, paste("P(f'):" , 1-(a_fa + s_fs + n_fn) ), cex = .9)
text(80,420, paste("P(s|f):" , (s_fs)/(a_fa + s_fs + n_fn)), cex = .9, col="red")