2023-11-07

What is Bayesian Statistics?

Bayesian statistics is a probabilistic approach that combines prior beliefs with new data to quantify uncertainty. It’s used for parameter estimation, decision-making under uncertainty, and in various fields like machine learning and Bayesian networks, where modeling complex, uncertain relationships is crucial. Bayesian statistics is a versatile tool for managing uncertainty and making informed decisions.

Bayes’ Theorem

The fundamental equation of Bayesian statistics is Bayes’ theorem:

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

Where:
P(A|B) is the probability of the hypothesis being true, given the evidence.
P(B|A) is the probability of the evidence occurring if the hypothesis is true.
P(A) is the prior probability of the hypothesis being true.
P(B) is the total probability of the evidence.

Dataset US Arrests

data(USArrests)
head(USArrests)
           Murder Assault UrbanPop Rape
Alabama      13.2     236       58 21.2
Alaska       10.0     263       48 44.5
Arizona       8.1     294       80 31.0
Arkansas      8.8     190       50 19.5
California    9.0     276       91 40.6
Colorado      7.9     204       78 38.7

US Arrests for Murder

Before employing Bayesian statistics, it’s beneficial to create data visualizations to explore and gain insights into the various types of US arrests across different states. This approach allows for a comprehensive examination of the information pertaining to US arrest statistics and their regional distribution.

US Arrests for Assault

Comparison of UrbanPop against Murder, Rape, and Assault

Higher UrbanPop correlates positively with Murder, Assault, and Rape, indicating that more urbanized areas typically exhibit increased crime rates.

Using Bayes Theorem to predict future murders in US States

By using Bayes’ theorem, future murders can be predicted by utilizing the “USArrests” dataset. This will require installing “rstan” or “brms” packages. Stan is a programming language for Bayesian data analysis.

\[ P(\text{Murder}|\text{UrbanPop, Assault, Rape}) = \frac{P(\text{UrbanPop, Assault, Rape}|\text{Murder}) \cdot P(\text{Murder})}{P(\text{UrbanPop, Assault, Rape})} \]

Code for the Bayes Theorem to plot the USArrests

data(USArrests)
murderdata <- USArrests$Murder
urbanpopdata <- USArrests$UrbanPop
assaultdata <- USArrests$Assault
rapedata <- USArrests$Rape

bayestheorem <- sum(murderdata == "Murder" & urbanpopdata * 
assaultdata * rapedata) / sum(urbanpopdata * assaultdata * rapedata)

Plot for the Bayes Theorem

hist(bayestheorem, main = "Histogram of the Bayes Theorem", 
xlab = "P(Murder | UrbanPop, Assault, Rape)",  col = "pink")