Bayes Theorem: Bayes Theorem is when we update the probability based on new information or evidence. By being able to use updated information in our formula, it can help us create a more accurate probability hypothesis. This theorem is an extension of conditional probabilities. Since we know conditional probability is what we use to predict the probability of A|B (A happening given that B happened). By using Bayes Theorem, we can essentially calculate the probability of A occurring if we know the probability of another event related to B occurring.
The formula is listed below.
\(P(A \mid B)\) = \(\displaystyle \frac{P(B\mid A)* P(A) }{P(B)}\)
rm(list=ls()) # empty environment
sessionInfo()
## R version 4.3.3 (2024-02-29 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.35 R6_2.5.1 fastmap_1.1.1 xfun_0.42
## [5] cachem_1.0.8 knitr_1.45 htmltools_0.5.7 rmarkdown_2.26
## [9] lifecycle_1.0.4 cli_3.6.2 sass_0.4.8 jquerylib_0.1.4
## [13] compiler_4.3.3 rstudioapi_0.15.0 tools_4.3.3 evaluate_0.23
## [17] bslib_0.6.1 yaml_2.3.8 rlang_1.1.3 jsonlite_1.8.8
library(BiocManager)
BiocManager::install("Rgraphviz")
## Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.3 (2024-02-29 ucrt)
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'Rgraphviz'
## Installation paths not writeable, unable to update packages
## path: C:/Program Files/R/R-4.3.3/library
## packages:
## boot, lattice
## Old packages: 'bslib', 'pkgbuild', 'processx', 'psych', 'sass'
require("Rgraphviz")
## Loading required package: Rgraphviz
## Loading required package: graph
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: grid
BiocManager::valid()
## Warning: 5 packages out-of-date; 0 packages too new
##
## * sessionInfo()
##
## R version 4.3.3 (2024-02-29 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Rgraphviz_2.46.0 graph_1.80.0 BiocGenerics_0.48.1
## [4] BiocManager_1.30.22
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.35 R6_2.5.1 fastmap_1.1.1 xfun_0.42
## [5] cachem_1.0.8 knitr_1.45 htmltools_0.5.7 rmarkdown_2.26
## [9] stats4_4.3.3 lifecycle_1.0.4 cli_3.6.2 sass_0.4.8
## [13] jquerylib_0.1.4 compiler_4.3.3 rstudioapi_0.15.0 tools_4.3.3
## [17] evaluate_0.23 bslib_0.6.1 yaml_2.3.8 rlang_1.1.3
## [21] jsonlite_1.8.8
##
## Bioconductor version '3.18'
##
## * 5 packages out-of-date
## * 0 packages too new
##
## create a valid installation with
##
## BiocManager::install(c(
## "bslib", "pkgbuild", "processx", "psych", "sass"
## ), update = TRUE, ask = FALSE, force = TRUE)
##
## more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date
Below, I have listed the probabilities for each type of event along with their corresponding % of how full the garage gets.
Academic Events: 35%
Sporting Events: 20%
No events: 45%
# Sporting Event
PA1 <- .20
# Academic Event
PA2 <- .35
# No Event
PA3 <- .45
# Garage Full - Sporting Event
PB_A1 <- .70
# Garage Full - Academic Event
PB_A2 <- .25
# Garage Full - No Event
PB_A3 <- .05
PA1_B <- (PA1 * PB_A1) / ((PA1*PB_A1)+(PA2*PB_A2)+(PA3*PB_A3))
PA1_B
## [1] 0.56
As we can see above, the probability of there being a sporting event when Jose sees the garage full is 56%.
Our numerator was probability of the garage being full during a sporting event * the probability of there being a sporting event
\(P(B \mid A)* P(A)\) = 0.20*0.70 = 0.14
We then set up our denominator being the sum of each events probability * the probability of the garage being full for that event
\({P(B)}\) = (0.20 * 0.70)+(0.35 * 0.25)+(0.45 * 0.05)
(0.14+0.0875+0.0225) = 0.25
\(P(A \mid B)\) = 0.14/0.025 = 0.56 = 56%
PC1 <- 1-PB_A1 # Not Full - Sport
PC2 <- 1-PB_A2 # Not Full - Academic
PC3 <- 1-PB_A3 # Not Full - No Event
# Create Joint Probabilities #
PAB1 <- PA1 * PB_A1
PAC1 <- PA1 * PC1
PAB2 <- PA2 * PB_A2
PAC2 <- PA2 * PC2
PAB3 <- PA3 * PB_A3
PAC3 <- PA3 * PC3
Nodes
node1<-"P"
node2<-"Sport"
node3<-"Academic"
node4<-"No Event"
node5<-"Sport Full"
node6<-"Sport N_Full"
node7<-"Acad Full"
node8 <-"Acad N_Full"
node9 <- "No Event Full"
node10 <-"No Event N_Full"
nodeNames<-c(node1,node2,node3,node4,node5,node6,node7,node8,node9,node10)
rEG <- new("graphNEL",
nodes=nodeNames,
edgemode="directed"
)
Branches
rEG <- addEdge(nodeNames[1], nodeNames[2], rEG, 1)
rEG <- addEdge(nodeNames[1], nodeNames[3], rEG, 1)
rEG <- addEdge(nodeNames[1], nodeNames[4], rEG, 1)
rEG <- addEdge(nodeNames[2], nodeNames[5], rEG, 1)
rEG <- addEdge(nodeNames[2], nodeNames[6], rEG, 1)
rEG <- addEdge(nodeNames[3], nodeNames[7], rEG, 1)
rEG <- addEdge(nodeNames[3], nodeNames[8], rEG, 1)
rEG <- addEdge(nodeNames[4], nodeNames[9], rEG, 1)
rEG <- addEdge(nodeNames[4], nodeNames[10], rEG, 1)
eAttrs <- list()
q<-edgeNames(rEG)
Adding Probability
eAttrs$label <- c(toString(PA1),
toString(PA2),
toString(PA3),
toString(PB_A1),
toString(PC1),
toString(PB_A2),
toString(PC2),
toString(PB_A3),
toString(PC3)
)
names(eAttrs$label) <- c(q[1],q[2], q[3], q[4], q[5], q[6], q[7], q[8], q[9])
edgeAttrs<-eAttrs
# Set the color, etc, of the tree
attributes<-list(node=list(label="example",
fillcolor="lightblue",
fontsize=""),
edge=list(color="black"),
graph=list(rankdir="LR"))
#Plot the probability tree using Rgraphvis
plot(rEG, edgeAttrs=eAttrs, attrs=attributes)
nodes(rEG)
## [1] "P" "Sport" "Academic" "No Event"
## [5] "Sport Full" "Sport N_Full" "Acad Full" "Acad N_Full"
## [9] "No Event Full" "No Event N_Full"