Fundamentals of Bayesian Data Analysis in R

DataCamp: Statistics with R

Bonnie Cooper

library( dplyr )
library( ggplot2 )
library( gridExtra )
library( tidyverse )
library( ggridges )
library( ggExtra )
library( lattice )
library( RColorBrewer )
library( BEST )

What is Bayesian Inference Work?

A First Taste of Bayes

Bayesian Inference in a Nutshell:
A method for figuring out unobservable quantities given known facts that uses probability to describe the uncertainty over what the values of the unknown quantities could be.

A very interesting summary of Alan Turing’s use of Bayesian inference to crack the enigma code is given. (use the decrypted message to find the probable setting of the enigma machine wheels) A more full treatment of the topic can be found here: Alan Turing and Enigma Statistics

Bayesian Data Analysis:

  • The use of bayesian inference to learn from data (the known facts) inference about parameters.
  • Can be used for hypothesis testing, linear regressin, etc.
  • It’s real power: Is flexible and allows you to construct problem-specific models

Bayesian Data Analysis: a tool to make sense of your data

Let’s Try Some Bayesian Data Analysis

Bayesian inferece == Probabilistic inference.

Probability:

  • A number from 0 to 1
  • A statement about the certainty / uncertainty of an event
  • 1 is complete certainty that a case is TRUE
  • 0 is complete certainty that a case is FALSE
  • Not only for binary yes / no events, but can be applied to continuous variables

The role of probability distributions in Bauesian data analysis is to represent uncertainty, and the role of Bayesian inference is to update probability distributions to reflect what has been learned from data.

Let’s look at a brief illustrative example,
A Bayesian model for the proportion of success
ex: What is the proportion ofsuccessful treatments with a new drug?

  • prop_model( data )
  • data is a vector of failures represented by 1s and 0s
  • There is an unknown underlying proportion of success
  • If the data point is a success is only affected by the proportion of success
  • Prior to seeing any data, any underlying proportion of success is equally likely
  • The result is a probability distribution that represents what the model knows about the underlying proportion of success

The output of prop_model is a plot showing what the model learns about the underlying proportion of success from each data point in the order you entered them. At n=0 there is no data, and all the model knows is that it’s equally probable that the proportion of success is anything from 0% to 100%. At n=4 all data has been added, and the model knows a little bit more.
will build toward implementing prop_model() during this course.

…but here is the code from the course author Rasmus Bååth

#The prop_model function - Rasmus Bååth R code

# This function takes a number of successes and failuers coded as a TRUE/FALSE
# or 0/1 vector. This should be given as the data argument.
# The result is a visualization of the how a Beta-Binomial
# model gradualy learns the underlying proportion of successes 
# using this data. The function also returns a sample from the
# posterior distribution that can be further manipulated and inspected.
# The default prior is a Beta(1,1) distribution, but this can be set using the
# prior_prop argument.

# Make sure the packages tidyverse and ggridges are installed, otherwise run:
# install.packages(c("tidyverse", "ggridges"))

# Example usage:
# data <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# prop_model(data)
prop_model <- function(data = c(), prior_prop = c(1, 1), n_draws = 10000,
                       gr_name="Proportion graph") {
  #library(tidyverse)
 
  data <- as.logical(data)
  # data_indices decides what densities to plot between the prior and the posterior
  # For 20 datapoints and less we're plotting all of them.
  data_indices <- round(seq(0, length(data), length.out = min(length(data) + 1, 40)))
  
  # dens_curves will be a data frame with the x & y coordinates for the 
  # denities to plot where x = proportion_success and y = probability
  proportion_success <- c(0, seq(0, 1, length.out = 100), 1)
  dens_curves <- map_dfr(data_indices, function(i) {
    value <- ifelse(i == 0, "Prior", ifelse(data[i], "Success", "Failure"))
    label <- paste0("n=", i)
    probability <- dbeta(proportion_success,
                         prior_prop[1] + sum(data[seq_len(i)]),
                         prior_prop[2] + sum(!data[seq_len(i)]))
    probability <- probability / max(probability)
    data_frame(value, label, proportion_success, probability)
  })
  # Turning label and value into factors with the right ordering for the plot
  dens_curves$label <- fct_rev(factor(dens_curves$label, levels =  paste0("n=", data_indices )))
  dens_curves$value <- factor(dens_curves$value, levels = c("Prior", "Success", "Failure"))
  
  graph_label <- paste("Prior likelihood distribution Beta(a =", 
                       as.character(prior_prop[1]),", b =",
                                    as.character(prior_prop[2]),")") 
  
  p <- ggplot(dens_curves, aes(x = proportion_success, y = label,
                               height = probability, fill = value)) +
    ggridges::geom_density_ridges(stat="identity", color = "white", alpha = 0.8,
                                  panel_scaling = TRUE, size = 1) +
    scale_y_discrete("", expand = c(0.01, 0)) +
    scale_x_continuous("Proportion of success") +
    scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = "", drop = FALSE,
                      labels =  c("Prior   ", "Success   ", "Failure   ")) +
    ggtitle(paste0(gr_name, ": ", sum(data),  " successes, ", sum(!data), " failures"),
            subtitle = graph_label) +
    labs(caption = "based on Rasmus Bååth R code") +
    theme_light() +
    theme(legend.position = "top")
  print(p)
  
  # Returning a sample from the posterior distribution that can be further 
  # manipulated and inspected
  posterior_sample <- rbeta(n_draws, prior_prop[1] + sum(data), prior_prop[2] + sum(!data))
  invisible(posterior_sample)
}

Zombie apocalypse: let us say there is a new drug to treat zombieism. It has never been tested before, but the results from this pilot test have two recoveries out of 4 subjects. Here, prop_model plots the successive posterior probabilities that the zombie antidote will be successful:

data <- c( 1, 0, 0, 1 )
prop_model( data )
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Next we observe several more subjects, however, they are all failures, leaving just 2 successes out of 13 subjects. Here is how the posterior probability changes with the new case information:

data <- c( 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
prop_model( data )

What if we observe a lot more?

data <- rbinom( 50, 1, 0.2 )
prop_model( data )

Compare this experimental zombie drug’s success with that of the currently used drug which has a 7% success rate:

data2 <- rbinom( 50, 1, 0.07 )
posterior2 <- prop_model( data2 )

head( posterior2 )
## [1] 0.07063968 0.10491471 0.02643282 0.02516204 0.04652902 0.10972237

Subjective obseration time: the more information collected about a variable, the closer the observed prior distribution approaches the ‘true’ probability of a success.

  • Take a prior probability distribution use observations from the data to update a posterior probability distribution
  • prior - a probability distribution that represents what the model knows before seeing the data
  • posterior - a probability distribution that represents what the model knows after having seen the data.
data = c(1, 0, 0, 1, 0, 0,
         0, 0, 0, 0, 0, 0, 0)
         
# Extract and explore the posterior
posterior <- prop_model( data )

head( posterior )
## [1] 0.1633159 0.1470803 0.2724994 0.1360496 0.1207068 0.1345991

plot the posterior distribution:

hist( posterior, breaks = 30, xlim = c( 0,1 ), col = 'palegreen4' )

Calculate a few descriptive statistics for the posterior distribution:

# Calculate the median
median(posterior)
## [1] 0.1850544
# Calculate the credible 90% interval
quantile(posterior, c(0.05, 0.95))
##         5%        95% 
## 0.06265585 0.38235821
# Calculate the probability that the success rate is greater than 0.7%
sum( posterior > 0.07 )/length(posterior)
## [1] 0.9317

Now let’s try to generate a similar plot with ggplot:

post_df <- data.frame( 'posterior' = posterior )

ggplot( post_df, aes( x = posterior ) ) +
  geom_histogram( fill = 'palegreen4', color = "#e9ecef" ) +
  ggtitle( 'Posterior Probability Distribution' )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

and display some summary statistics:

summary( post_df )
##    posterior       
##  Min.   :0.005171  
##  1st Qu.:0.124192  
##  Median :0.185054  
##  Mean   :0.198854  
##  3rd Qu.:0.260246  
##  Max.   :0.671825

How to interpret these summary statistics?
“Given the data of two cured and 11 relapsed zombies, there is a 90% probability that the drug cures between 6% and 39% of treated zombies. Further, there is a 93% probability that this new drug is more effective that the current zombie drug.”

Compare the 2 posterior probability distributions in an overlapped histogram:

post_df <- data.frame( 'posterior' = posterior, 'posterior2' = posterior2 )
post_long <- post_df %>%
  pivot_longer( cols = everything(), names_to = 'dist', values_to = 'prob' ) %>%
  ggplot( aes( x = prob, fill = dist ) ) +
  geom_histogram( color = "#e9ecef" ) +
  scale_fill_manual(values=c("palegreen4", "#404080"))
post_long
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

How does Bayesian Inference Work?

The Parts Needed for Bayesian Inference

Bayesian Inference:

  • \(\mbox{Bayesian Inference } = \mbox{Data } + \mbox{ a Generative Model } + \mbox{ Priors }\)
  • Generative Model: can be a computer program, mathematical expression or a set of rules that can be fed parameters values to be used to simulate data.
#Paramters
prop_success <- 0.15
n_zombies <- 13
#simulate the data with rbinom()
data <- rbinom( n_zombies, 1, prop_success )
data
##  [1] 0 0 0 1 0 0 0 0 0 0 0 0 0

This similar function was presented in the course video:

data <- c()
for( zombie in 1 : n_zombies ) {
  data[ zombie ] <- runif( 1, min = 0, max = 1 ) < prop_success
}
data <- as.numeric( data )
data
##  [1] 0 0 0 0 0 0 1 0 0 0 1 0 0

This function gives an idea of the logic for the prevous code. However, using rbinom() is more efficient. Now run the generative model where the success is estimated at 42% and the test is done with 100 zombies.

prop_success <- 0.42
n_zombies <- 100
#simulate the data with rbinom()
data <- rbinom( 1, n_zombies, prop_success )
data
## [1] 51

37 zombies got cured in this run of the generative model. How about if it is run 200 times?

data <- rbinom( 200, n_zombies, prop_success )
data
##   [1] 46 32 41 41 42 38 48 38 40 44 44 37 38 46 43 43 47 50 41 42 41 42 40 43 42
##  [26] 42 31 42 41 45 46 44 47 48 46 44 45 31 42 49 35 46 39 35 42 42 46 47 41 47
##  [51] 40 48 35 40 46 40 38 38 45 36 41 35 36 44 51 40 39 40 43 45 46 39 47 47 44
##  [76] 42 45 48 34 32 43 42 43 45 39 44 41 42 40 42 45 48 34 46 43 50 43 35 34 42
## [101] 37 40 45 49 38 36 42 45 28 38 43 41 47 33 37 44 34 48 47 32 36 37 41 48 36
## [126] 38 44 37 39 38 43 40 46 41 44 42 34 37 50 46 38 47 35 41 47 42 40 31 47 42
## [151] 44 40 51 45 46 34 40 37 40 40 42 39 44 43 40 35 45 39 41 45 50 39 37 37 47
## [176] 52 44 45 43 52 33 33 47 49 34 43 39 38 46 44 48 39 30 38 43 37 38 37 43 44

Above are the results of 200 simulations of the binomial generative model with the given parameters. Show some summary stats for this distribution…

summary( data )
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   28.00   38.00   42.00   41.53   45.00   52.00
mean( data )
## [1] 41.535

Using a Generative Model

apply our generative model to a relatively large number of simulated trials. The results will for the probability distribution for our process:

cured_zombies <- rbinom( n = 100000, size = 100, prob = 0.07 )
post_df <- data.frame( 'Cured_Zombies' = cured_zombies )

ggplot( post_df, aes( x = Cured_Zombies ) ) +
  geom_histogram( binwidth = 1, fill = 'palegreen4', color = "#e9ecef" ) +
  ggtitle( 'Cured Zombies Probability Distribution' )

However, it is more often the case that we have data, and do not need a generative model to create it. However, we do not know the parameters that drive the process. Bayesian Inference is used to invert the probability to work backwards and learn about the underlying process given the data at hand.

New Situations: instead of the zombie example, let’s look at clickthrough rates….
How many visitors to a site?
To get more visitors to your website you are considering paying for an ad to be shown 100 times on a popular social media site. According to the social media site, their ads get clicked on 10% of the time.

# Fill in the parameters
n_samples <- 100000
n_ads_shown <- 100
proportion_clicks <- 0.1
n_visitors <- rbinom(n_samples, size = n_ads_shown, 
                     prob = proportion_clicks)

post_df <- data.frame( 'Site_Visits' = n_visitors )

ggplot( post_df, aes( x = Site_Visits ) ) +
  geom_histogram( binwidth = 1, fill = 'cadetblue', color = "#e9ecef" ) +
  ggtitle( 'Number of Visitors Distribution' )

Prepresenting Uncertainty with Priors

And now to address the Priors. Prior Probability Distribution: represents how uncertain the model is about the parameters before seeing any data. Ideally, this uncertainty should reflect our uncertainty. The reason it is called a prior is because it represents the uncertainty before having included information about the data.

You’re not so sure that your ad will get clicked on exactly 10% of the time. Instead of assigning proportion_clicks a single value you are now going to assign it a large number of values drawn from a probability distribution.

n_samples <- 100000
n_ads_shown <- 100
# model a uniform probability distribution that ranges from 0 to 0.2 to represent the uncertainty of our value for this prior
proportion_clicks <- runif( n = n_samples, min = 0.0, max = 0.2 )
n_visitors <- rbinom( n_samples, n_ads_shown, proportion_clicks )

Visualize the probability distribution for the prior, proportion_clicks.

pclick_df <- data.frame( 'probability_click' = proportion_clicks )

pclick_plot <- ggplot( pclick_df, aes( x = probability_click ) ) +
  geom_histogram( fill = 'cadetblue', color = "#e9ecef" ) +
  ggtitle( 'Probability Distribution of the Prior' )

nvis_df <- data.frame( 'Site_Visits' = n_visitors )

nvis_plot <- ggplot( nvis_df, aes( x = Site_Visits ) ) +
  geom_histogram( binwidth = 1, fill = 'cadetblue', color = "#e9ecef" ) +
  ggtitle( 'Number of Visitors Distribution' )

grid.arrange( pclick_plot, nvis_plot, ncol = 2 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The envelope of the n-visitors data from this generative model is much different. The added uncertainty for the prior increases the uncertainty in the number of visitors the site will recieve.

Bayesian Models and Conditioning

What if we send our add out in to the wilds of the market place and get some data on it’s performance. Let’s say 13 out of 100 users who were shown the ad followed through and clicked to visit the website. With this data, we can now apply some Bayesian Inference….

#this dataframe represents the joint probability distribution over proportion_clicks and n_visitors together.
prior_df <- data.frame( 'proportion_clicks' = proportion_clicks, 'n_visitors' = n_visitors )

#visualize as a scatterplot....
prior_plot <- ggplot( prior_df, aes( x = n_visitors, y = proportion_clicks ) ) +
  geom_point( alpha = 1/10 ) +
  #geom_jitter(alpha = 1/10 ) +
  theme( legend.position = 'none' )
prior_plot <- ggMarginal( prior_plot, type = 'histogram', 
                          xparams = list(  bins=20 ),
                          yparams = list(  bins=20))
prior_plot

The histograms show the marginal distributions. These are the same probability distributions that were described and visualized in the previous exercises.

There is a clear trend in the data: the higher the underlying proportion of clicks, the more visitors there will be to the website

From this joint probability data, we can come up wih probability distributions for specific conditional situations. For example, If we know a conditional value for proportion_clicks = 0.10, then we can describe the distribution of n_visitors

prior10_df <- prior_df %>%
  mutate( p10 = round( proportion_clicks, 2) == 0.1 ) %>%
  filter( p10 == TRUE ) %>%
  select( c( n_visitors, proportion_clicks ) )

prior_plot <- ggplot( prior_df, aes( x = n_visitors, y = proportion_clicks ) ) +
  geom_point( alpha = 1/10 ) +
  geom_point(data = prior10_df, aes(x = n_visitors, y = proportion_clicks, color = 'red' ) ) +
  theme( legend.position = 'none' )
prior_plot <- ggMarginal( prior_plot, type = 'histogram', 
                          xparams = list(  bins=20 ),
                          yparams = list(  bins=20))
prior_plot

And now to plot this distribution:

ggplot( prior10_df, aes( x = n_visitors ) ) +
  geom_histogram( binwidth = 1, fill = 'red', color = "#e9ecef" ) +
  ggtitle( 'Probability Distribution of n_visitors given proportion_clicks = 0.1' )

If we shift the proportion_clicks, the distribution of n_visitors shifts accordingly. Additionally, we can condition the proportion_clicks on a given n_visitors

This demonstrations brings us to the essence of Bayesian Inference:
Bayesian inference is conditioning on data, in order to learn about parameter values

Let’s condition on the data we collected were we got 13 visits from 100 presentations of the ad. Find the posterior distribution for the proportion of clicks given there were 13 visitors. It is called a posterior, because it represents the uncertainty after having included information from the observed data.

head( prior_df )
##   proportion_clicks n_visitors
## 1        0.07408206         10
## 2        0.14653305         14
## 3        0.02444900          1
## 4        0.14785947         16
## 5        0.06287937          3
## 6        0.09022803          9
#condition the joint distribution with the observation that there were 13 visitors
# Create the posterior data frame
posterior <- prior_df[prior_df$n_visitors == 13, ]

# Visualize posterior proportion clicks
hist( posterior$proportion_clicks )

Now we can use this new estimate of the prior distribution to compute n_visits if we rerun the ad campaign:

# Assign posterior to a new variable called prior
prior <- posterior

# Take a look at the first rows in prior
head(prior)
##     proportion_clicks n_visitors
## 27         0.13675733         13
## 41         0.13979790         13
## 56         0.10820342         13
## 85         0.08331700         13
## 89         0.10834601         13
## 103        0.07735876         13
# Replace prior$n_visitors with a new sample and visualize the result
n_samples <-  nrow(prior)
n_ads_shown <- 100
prior$n_visitors <- rbinom(n_samples, size = n_ads_shown,
                           prob = prior$proportion_clicks)
hist(prior$n_visitors)

# Calculate the probability that you will get 5 or more visitors
sum(prior$n_visitors >= 5) / length(prior$n_visitors)
## [1] 0.9878283

From the observation of 13/100 visits, we find that it is very probable (99% probable) that future add campaigns will generate at least 5 visits.

Why use Bayesian Data Analysis?

Four Good Things with Bayes

Bayes is very Flexible!

  1. Can include information sources in addition to the data
  • Background Information
  • Expert Opinion
  • Common Knowledge
  • ex: suppose you ask a vendor what the median and range of success has been for previous ad campaigns.
  1. Can make many comparisons between groups or data sets
  • Can test different experimental conditions
  • Can easily find the probable difference between treatment groups
  • ex: suppose you have two different treatment groups
  1. Can use the result of Bayesian Analysis to do Decision Analysis
  • Decision Analysis: take the result of a statistical analysis and post-process it to apply to a process of interest.
  • the posterior distributions are not of principal interest, rather an outcome to meet a goal (e.g. highest return/clickthrough)
  1. Can change the underlying statistical model
  • if new data indicates the need for a change e.g. from a binomial model to include a new variable(s)
  1. Bayes is optimal in the smol world that is the model
  2. In Bayesian data analysis there is a separation between model and computation

Example: suppose you ask a vendor what the median and range of success has been for previous ad campaigns. The vendor responds that most ads get 5% clicks with a range of 2-8%. This information can be incorporated by updating the priors of a bayesian statistical model.

we will use the beta distibution with rbeta() as a generative model for our informed pior

# Explore using the rbeta function
beta_sample <- rbeta(n = 1000000, shape1 = 1, shape2 = 1)

# Visualize the results
hist( beta_sample )

The above distibution is an uninformative prior because it gives an equally likely probability for all values from 0 to 1.
rbeta() takes two shape parameters that must be positive. The larger the shape1 parameter, the closer the mean of the distribution is the 1. The larger the shape2 parameter, the closer the mean of the distribution is to 0.

# Modify the parameters
beta_sample1 <- data.frame( 'dist' = rbeta(n = 1000000, shape1 = 100, shape2 = 20) )
# Visualize the results
close21 <- ggplot( beta_sample1, aes( x = dist ) ) +
  geom_histogram( fill = 'cadetblue', color = "#e9ecef" ) +
  ggtitle( 'rbeta(nshape1 = 100, shape2 = 20)' )
# Modify the parameters
beta_sample2 <- data.frame( 'dist' = rbeta(n = 1000000, shape1 = 20, shape2 = 100) )
# Visualize the results
close20 <- ggplot( beta_sample2, aes( x = dist ) ) +
  geom_histogram( fill = 'cadetblue', color = "#e9ecef" ) +
  ggtitle( 'rbeta(nshape1 = 20, shape2 = 100)' )

grid.arrange( close21, close20, ncol = 2 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are many values that could be used to construct a reasonable prior distribution based on the information given to us, but here is a good suggestion:

# Explore using the rbeta function
beta_sample <- rbeta(n = 1000000, shape1 = 5, shape2 = 95)

# Visualize the results
hist( beta_sample )

Now for some excitement!: Let’s update the model developed in the previous chapter with our new, informed prior distribution for proportion_clicks

Old Prior:

n_draws <- 100000
n_ads_shown <- 100

# Change the prior on proportion_clicks
proportion_clicks <- 
  runif(n_draws, min = 0.0, max = 0.2)
n_visitors <- 
  rbinom(n_draws, size = n_ads_shown, 
         prob = proportion_clicks)
prior <- 
  data.frame(proportion_clicks, n_visitors)
posterior <- 
  prior[prior$n_visitors == 13, ]

# This plots the prior and the posterior in the same plot
par(mfcol = c(2, 1))
hist(prior$proportion_clicks, 
     xlim = c(0, 0.25))
hist(posterior$proportion_clicks, 
     xlim = c(0, 0.25))

New Prior

n_draws <- 100000
n_ads_shown <- 100

# Change the prior on proportion_clicks
proportion_clicks <- 
  rbeta(n_draws, shape1 = 5, shape2 = 95)
n_visitors <- 
  rbinom(n_draws, size = n_ads_shown, 
         prob = proportion_clicks)
prior <- 
  data.frame(proportion_clicks, n_visitors)
posterior <- 
  prior[prior$n_visitors == 13, ]

# This plots the prior and the posterior in the same plot
par(mfcol = c(2, 1))
hist(prior$proportion_clicks, 
     xlim = c(0, 0.25))
hist(posterior$proportion_clicks, 
     xlim = c(0, 0.25))

The proportion_clicks prior is Dead! Long live the proportion_clicks prior!
Note the change in the prior distribution, but also the shift towards the left of the posterior. Indeed, the information given in the informaed prior tames the more enthusiastic response in the small data sample we have recieved with 13% clicks. The posterior distribution is shaped by both the observed data and the prior information. Which posterior should we use? The naive model favors the data we have collected. Whereas the informed posterior incorporates the outside information we have gathered. If the information is correct, then the informed posterior should lead to beter estimates of future outcomes.

Contrasts and Comparisons

Comparing Video and Text ads

posterior <- data.frame( 'video_prop' = rbeta(n = 1000000, shape1 = 13, shape2 = 87 ),
                         'text_prop' = rbeta(n = 1000000, shape1 = 6, shape2 = 94 ) ) %>%
  mutate( 'prop_diff' = video_prop - text_prop )
head( posterior )
##   video_prop  text_prop   prop_diff
## 1 0.12683939 0.08997656 0.036862837
## 2 0.14732568 0.07814926 0.069176427
## 3 0.11575964 0.10835300 0.007406641
## 4 0.09275093 0.06422393 0.028526996
## 5 0.12079666 0.03355893 0.087237732
## 6 0.22117221 0.03616603 0.185006184

The new feature, prop_diff describes the probability difference between the two conditions

visualize:

posterior_long <- posterior %>%
  pivot_longer( cols = everything(), names_to = 'dist', values_to = 'val' )

ggplot(posterior_long , 
       aes(x=val, fill=dist)) + geom_histogram(alpha=0.4, position="identity")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Define parameters
n_draws <- 100000
n_ads_shown <- 100
proportion_clicks <- runif(n_draws, min = 0.0, max = 0.2)
n_visitors <- rbinom(n = n_draws, size = n_ads_shown, 
                     prob = proportion_clicks)
prior <- data.frame(proportion_clicks, n_visitors)

# Create the posteriors for video and text ads
posterior_video <- prior[prior$n_visitors == 13, ]
posterior_text <- prior[prior$n_visitors == 6, ]

# Visualize the posteriors
hist(posterior_video$proportion_clicks, xlim = c(0, 0.25))

hist(posterior_text$proportion_clicks, xlim = c(0, 0.25))

posterior <- data.frame(
    video_prop = posterior_video$proportion_clicks[1:4000],
    text_prop  = posterior_text$proportion_click[1:4000])
    
# Calculate the posterior difference: video_prop - text_prop
posterior$prop_diff = posterior$video_prop - posterior$text_prop 

# Visualize prop_diff
hist(posterior$prop_diff, xlim = c(0, 0.25))

# Calculate the median of prop_diff
median(posterior$prop_diff)
## [1] 0.06738542
# Calculate the proportion
# That is, calculate the proportion of samples in posterior$prop_diff that are more than zero.
sum( posterior$prop_diff > 0 ) / length( posterior$prop_diff )
## [1] 0.9515

Decision Analysis

Bayesian data analysis makes it pretty easy to compare and contrast parameter estimates.

Decision Analysis: using the posteriors to apply to a process. e.g.: now that we know the posteriors, which will be more lucrative video or text add campaign?

video_cost <- 0.25 #per video ad
text_cost <- 0.05 #per text ad
visitor_spend <- 2.53 #return for each visitor to site

posterior$video_profit <- posterior$video_prop * visitor_spend - video_cost
posterior$text_profit <- posterior$text_prop * visitor_spend - text_cost
posterior$profit_diff <- posterior$video_profit - posterior$text_profit

glimpse( posterior )
## Rows: 4,000
## Columns: 6
## $ video_prop   <dbl> 0.09093018, 0.18179948, 0.17805112, 0.11829070, 0.143525…
## $ text_prop    <dbl> 0.05785093, 0.08097478, 0.04782950, 0.05842797, 0.060830…
## $ prop_diff    <dbl> 0.033079259, 0.100824699, 0.130221624, 0.059862734, 0.08…
## $ video_profit <dbl> -0.019946634, 0.209952674, 0.200469343, 0.049275476, 0.1…
## $ text_profit  <dbl> 0.09636284, 0.15486619, 0.07100863, 0.09782276, 0.103901…
## $ profit_diff  <dbl> -0.116309475, 0.055086488, 0.129460709, -0.048547283, 0.…

Which type of add will give the most money?

posterior_long <- posterior %>%
  select( c( video_profit, text_profit ) ) %>%
  pivot_longer( cols = everything(), names_to = 'dist', values_to = 'val' )

ggplot(posterior_long , 
       aes(x=val, fill=dist)) + geom_histogram(alpha=0.4, position="identity")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot( posterior, aes( x = profit_diff ) ) +
  geom_histogram() +
  geom_vline( xintercept = 0, color = 'red' ) +
  geom_vline( xintercept = median( posterior$profit_diff ),
              color = 'blue' ) +
  annotate( geom = 'text', 
            x = median( posterior$profit_diff ) - 0.01,
            y = 150, label = 'Median Value', 
            color = 'blue', angle = 90 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Calculate a "best guess" for the difference in profits
median(posterior$profit_diff)
## [1] -0.02951489
# Calculate the probability that text ads are better than video ads
sum( posterior$profit_diff < 0 ) / length( posterior$profit_diff )
## [1] 0.619

In conclusion, even though text ads get a lower proportion of clicks, they are also much cheaper. And, as you have calculated, there is a 63% probability that text ads are better.

Change Anything and Everything

Let us say that we can also show a banner ad on a website. We have tested this out and received 19 clicks for a 24 hour period. What is a good generative model to use here?

Poisson Distribution:

  • takes 1 parameter: the mean number of events per time unit
  • rpois samples the Poisson distibution
#let's say the mean clicks is 20/day
n_clicks <- rpois( n = 100000, lambda = 20 )
hist( n_clicks )

Explore the Poisson Distribution more before we move on with Bayes

# Simulate from a Poisson distribution and visualize the result
x  <- rpois(n = 10000, lambda = 3)
hist( x )

#Let's say that you run an ice cream stand and on cloudy days you on average sell 11.5 ice creams. It's a cloudy day.
x  <- rpois(n = 10000, lambda = 11.5)
hist( x )

#It's still a cloudy day, and unfortunately, you won't break even unless you sell 15 or more ice creams.
# Calculate the probability of break-even
sum( x >= 15 )/length( x )
## [1] 0.1851

Let’s change the generative model to a poisson mod:

# Change the model according to instructions
n_draws <- 100000
mean_clicks <- runif(n_draws, min = 0, max = 80)
n_visitors <- rpois(n = n_draws, mean_clicks)

prior <- data.frame(mean_clicks, n_visitors)
posterior <- prior[prior$n_visitors == 19, ]

# Visualize mean_clicks
hist( prior$mean_clicks )

hist( posterior$mean_clicks)

Bayes is Optimal, kind of….

Bayes is optimal in the small world of the model.

Bayesian Inference with Bayes’ Theorem

Probability Rules

fitting Bayesian models more efficiently

Probability Theory

  • Probability
    • a number between 0 and 1
    • A statement of certainty/uncertainty
  • Mathematical Notation
    • P(n = 1) is a probability
    • P(n) is a probability distribution
    • P(n = 1|m = 1) is a conditional probability
    • P(n|m = 1) is a conditional probability distribution
  • Manipulating (independent) Probabilities
    • The Sum Rule
      • p( 1 or 2 or 2 ) = 1/6 + 1/6 + 1/6 = 0.5
    • The Product Rule
      • p( 6 and 6 ) = 1/6 * 1/6 = 1/36 = 2.8%

Calculating Likelihoods

we simulated using the rbinom, rpois etc. functions. now we can calculate using the d functions like dbinom and pois

#calculate P(n_visitors = 13 | prob_success = 10%)
dbinom( 13, size = 100, prob = 0.1)
## [1] 0.07430209
#calculate P(n_visitors = 13 or n_visitors = 14 | prob_success = 10 )
dbinom( 13, size = 100, prob = 0.1 ) + dbinom( 14, size = 100, prob = 0.1 )
## [1] 0.1256059

Calculating probability distributions

#calculate P( n_visitors | prob_success = 0.1 )
n_visitors <- seq( 0, 100, by = 1 )
probability <- dbinom( n_visitors, size = 100, prob = 0.1 )
plot( n_visitors, probability, type = 'h' )

Probability Density.. Continuous Distributions

dunif( x = 0.12, min = 0, max = 0.2 )
## [1] 5
# Explore using dbinom to calculate probability distributions
n_ads_shown <- 100
proportion_clicks <- 0.1
n_visitors <- seq(0, 100)
prob <- dbinom(n_visitors, 
    size = n_ads_shown, prob = proportion_clicks)

# Plot the distribution
plot( n_visitors, prob, type = 'h')

# Change the code according to the instructions
n_ads_shown <- 100
proportion_clicks <- seq(0, 1, by = 0.01)
n_visitors <- 13
prob <- dbinom(n_visitors, 
    size = n_ads_shown, prob = proportion_clicks)

plot(proportion_clicks, prob, type = "h")

Bayesian Calculation

Bayesian Inference by Calculation. Instead of simulating probabilities, we will directly calculate them in R.

n_ads_shown <- 100
n_visitors <- seq( 0, 100, by = 1 )
proportion_clicks <- seq( 0, 1, 0.01 )
pars <-  expand.grid( proportion_clicks = proportion_clicks,
                   n_visitors = n_visitors ) 
pars$prior <- dunif( pars$proportion_clicks, min = 0, max = 0.2 )
pars$likelihood <- dbinom( pars$n_visitors, size = n_ads_shown,
                           prob = pars$proportion_clicks)
pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )

glimpse( pars )
## Rows: 10,201
## Columns: 5
## $ proportion_clicks <dbl> 0.00, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.0…
## $ n_visitors        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ prior             <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ likelihood        <dbl> 1.000000e+00, 3.660323e-01, 1.326196e-01, 4.755251e…
## $ probability       <dbl> 4.761905e-02, 1.743011e-02, 6.315217e-03, 2.264405e…
ggplot( pars, aes( x = n_visitors, y = proportion_clicks,
                   fill = probability ) ) +
  geom_tile() +
  xlim( c( 0, 35 ) ) +
  ylim( c( 0, 0.2 ) ) +
  scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 9445 rows containing missing values (geom_tile).

pars13 <- pars[ pars$n_visitors == 13, ]
pars13$probability <- pars13$probability / sum( pars$ probability )

ggplot( pars13, aes( x = n_visitors, y = proportion_clicks,
                   fill = probability ) ) +
  geom_tile() +
  xlim( c( 0, 35 ) ) +
  ylim( c( 0, 0.2 ) ) +
  scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 80 rows containing missing values (geom_tile).

ggplot( pars13, aes( x = proportion_clicks, y = probability ) ) +
  geom_area() +
  xlim( c( 0, 0.3 ) )
## Warning: Removed 70 rows containing missing values (position_stack).

A conditional shortcut: You can directly condition on the data, no need to first create the joint distribution.

# Simplify the code below by directly conditioning on the data
n_ads_shown <- 100
proportion_clicks <- seq(0, 1, by = 0.01)
n_visitors <- 6
pars <- expand.grid(proportion_clicks = proportion_clicks,
                    n_visitors = n_visitors)
pars$prior <- dunif(pars$proportion_clicks, min = 0, max = 0.2)
pars$likelihood <- dbinom(pars$n_visitors, 
    size = n_ads_shown, prob = pars$proportion_clicks)
pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum(pars$probability)
pars$probability <- pars$probability / sum(pars$probability)
plot(pars$proportion_clicks, pars$probability, type = "h")

Bayes’ Theorem

This is an implementation of Bayes’ Theorem:

pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )

Now to write this in proper notation:

\[P(\theta|D) = \frac{\mbox{P}(D|\theta) \cdot \mbox{P}(\theta)}{\sum \mbox{P}(D|\theta) \cdot \mbox{P}(\theta)}\] where:

  • \(\theta\) = paramters
  • \(D\) = the Data

The probability of the different parameter values given some data equals the likelihood of the relative probability of the data given the different parameter values multiplied by the prior probability of the different parameter values before seeing the data normalized by the total sum of the likelihood weighted by the prior.

In the previous chapter, we used simulation to find the solution. In this chapter we calculated the probability using an algorithm called grid approximation. However, there are many more algorithms to fit Bayesian models, some being more efficient than others.

tilda notation \(\sim \longrightarrow \mbox{is distributed as }\)

More Parameters, More Data, and More Bayes…

The Temperature in a Normal Lake

temp <- c( 19, 23, 20, 17, 23 )
#f = 66, 73, 68, 63, 73

Use the Normal distribution as a generative model for values centered about a mean value.

rnorm( n = 5, mean =  20, sd = 2 )
## [1] 17.59292 22.73173 22.78236 19.48360 19.16987

use dnorm() to look at how likely an outcome is given some fixed parameters

like <- dnorm( x = temp, mean = 20, sd = 2 )
like
## [1] 0.1760327 0.0647588 0.1994711 0.0647588 0.0647588

the results are the relative likelihoods of the data points.

#log likelihood
log(like)
## [1] -1.737086 -2.737086 -1.612086 -2.737086 -2.737086

using rnorm & dnorm:

# Assign mu and sigma
mu <- 3500
sigma <- 600

par(mfrow = c( 1,2 ) ) 
weight_distr <- rnorm(n = 100000, mean = mu, sd = sigma)
p1 <- hist(weight_distr, 60, xlim = c(0, 6000), col = "lightgreen")

# Create weight
weight <- seq(0, 6000, by = 100)

# Calculate likelihood
likelihood <- dnorm(weight, mu, sigma)

# Plot the distribution of weight
p2 <- plot( x = weight,
y = likelihood,
type = 'h' )

A Bayesian Model of Water Temperature

so, we have out temps: temp = 19, 23, 20, 17, 23
and we have out generative model: \(\mbox{temp}_i \sim \mbox{Normal}(\mu,\sigma)\)

Now we need to our prior distributions… our informed guess about water temperatures is that we can expect the following standard deviation and mean:
\(\sigma \sim \mbox{Uniform}(\mbox{min: }0,\mbox{max: }10)\)
\(\mu \sim \mbox{Normal}(\mbox{mean: }18,\mbox{sd: }5)\)

we could have used many other priors, but this is our best guess for now.

Let’s fit our new normal model using the grid approximation method we used earlier

temp <- c( 19, 23, 20, 17, 23 )
mu <- seq( 8, 30, by = 0.5 )
sigma <- seq( 0.1, 10, by = 0.3 )
pars <- expand.grid( mu = mu, sigma = sigma )

plot( pars, pch = 19, main = 'The Parameter Space' )

#the priors
pars$mu_prior <- dnorm( pars$mu, mean = 18, sd = 5 )
pars$sigma_prior <- dunif( pars$sigma, min = 0, max = 10 )
pars$prior <- pars$mu_prior * pars$sigma_prior
for( i in 1:nrow( pars ) ) {
  likelihoods <- dnorm( temp, pars$mu[i], pars$sigma[i])
  pars$likelihood[i] <- prod( likelihoods )
}

#calculate the posterior probability
pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )

summary( pars )
##        mu           sigma          mu_prior         sigma_prior 
##  Min.   : 8.0   Min.   : 0.10   Min.   :0.004479   Min.   :0.1  
##  1st Qu.:13.5   1st Qu.: 2.50   1st Qu.:0.018810   1st Qu.:0.1  
##  Median :19.0   Median : 5.05   Median :0.043570   Median :0.1  
##  Mean   :19.0   Mean   : 5.05   Mean   :0.043233   Mean   :0.1  
##  3rd Qu.:24.5   3rd Qu.: 7.60   3rd Qu.:0.066645   3rd Qu.:0.1  
##  Max.   :30.0   Max.   :10.00   Max.   :0.079788   Max.   :0.1  
##      prior             likelihood         probability       
##  Min.   :0.0004479   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0.0018810   1st Qu.:4.000e-12   1st Qu.:2.000e-09  
##  Median :0.0043570   Median :1.972e-08   Median :1.106e-05  
##  Mean   :0.0043233   Mean   :4.214e-07   Mean   :6.536e-04  
##  3rd Qu.:0.0066645   3rd Qu.:1.664e-07   3rd Qu.:2.307e-04  
##  Max.   :0.0079788   Max.   :1.175e-05   Max.   :1.891e-02

now to visualize:

ggplot( pars, aes( x = mu, y = sigma,
                   fill = probability ) ) +
  geom_tile() +
  xlim( c( 10, 30 ) ) +
  ylim( c( 0, 10 ) ) +
  scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 136 rows containing missing values (geom_tile).

From this result, we would expect the most probable temperature to be around 19-22 degrees C.

Now to try with the zombie exercise

# The IQ of a bunch of zombies
iq <- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 46)
# Defining the parameter grid
pars <- expand.grid(mu = seq(0, 150, length.out = 100), 
                    sigma = seq(0.1, 50, length.out = 100))
# Defining and calculating the prior density for each parameter combination
pars$mu_prior <- dnorm(pars$mu, mean = 100, sd = 100)
pars$sigma_prior <- dunif(pars$sigma, min = 0.1, max = 50)
pars$prior <- pars$mu_prior * pars$sigma_prior
# Calculating the likelihood for each parameter combination
for(i in 1:nrow(pars)) {
  likelihoods <- dnorm(iq, pars$mu[i], pars$sigma[i])
  pars$likelihood[i] <- prod(likelihoods)
}
# Calculate the probability of each parameter combination
pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )

# Visualize
ggplot( pars, aes( x = mu, y = sigma,
                   fill = probability ) ) +
  geom_tile() +
  xlim( c( 30, 60 ) ) +
  ylim( c( 5, 20 ) ) +
  scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 9400 rows containing missing values (geom_tile).

Generate the same visualization as in the course using the levelplot from the lattice library

coul <- colorRampPalette(brewer.pal(8, "PuRd"))(25)
levelplot(probability ~ mu * sigma, data = pars, , col.regions = coul)

From this visualization we see that a credible interval of Zombie IQ is a mean of 42 +/- 10

Answering the Question: Should I have a beach party?

Revisiting the lake:

  • What’s likely the average water temperature on 20th of July?
  • What’s the probability that the water temperture is going to be 18 or more on the nest 20th of July?
temp <- c( 19, 23, 20, 17, 23 )
mu <- seq( 8, 30, by = 0.5 )
sigma <- seq( 0.1, 10, by = 0.3 )
pars <- expand.grid( mu = mu, sigma = sigma )
#the priors
pars$mu_prior <- dnorm( pars$mu, mean = 18, sd = 5 )
pars$sigma_prior <- dunif( pars$sigma, min = 0, max = 10 )
pars$prior <- pars$mu_prior * pars$sigma_prior
for( i in 1:nrow( pars ) ) {
  likelihoods <- dnorm( temp, pars$mu[i], pars$sigma[i])
  pars$likelihood[i] <- prod( likelihoods )
}

#calculate the posterior probability
pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )

summary( pars )
##        mu           sigma          mu_prior         sigma_prior 
##  Min.   : 8.0   Min.   : 0.10   Min.   :0.004479   Min.   :0.1  
##  1st Qu.:13.5   1st Qu.: 2.50   1st Qu.:0.018810   1st Qu.:0.1  
##  Median :19.0   Median : 5.05   Median :0.043570   Median :0.1  
##  Mean   :19.0   Mean   : 5.05   Mean   :0.043233   Mean   :0.1  
##  3rd Qu.:24.5   3rd Qu.: 7.60   3rd Qu.:0.066645   3rd Qu.:0.1  
##  Max.   :30.0   Max.   :10.00   Max.   :0.079788   Max.   :0.1  
##      prior             likelihood         probability       
##  Min.   :0.0004479   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0.0018810   1st Qu.:4.000e-12   1st Qu.:2.000e-09  
##  Median :0.0043570   Median :1.972e-08   Median :1.106e-05  
##  Mean   :0.0043233   Mean   :4.214e-07   Mean   :6.536e-04  
##  3rd Qu.:0.0066645   3rd Qu.:1.664e-07   3rd Qu.:2.307e-04  
##  Max.   :0.0079788   Max.   :1.175e-05   Max.   :1.891e-02
sample_indices <- sample( 1:nrow( pars ), size = 10000,
                          replace = TRUE, prob = pars$probability )
head( sample_indices )
## [1]  614  343  604  387  557 1156
pars_sample <- pars[sample_indices, c( 'mu', 'sigma' ) ]
head( pars_sample )
##        mu sigma
## 614  22.0   4.0
## 343  21.5   2.2
## 604  17.0   4.0
## 387  21.0   2.5
## 557  16.0   3.7
## 1156 23.0   7.6

now that we have generated samples from the distribution, let’s plots them:

ggplot( pars_sample, aes( x = mu ) ) +
  geom_histogram( bins = 38 ) +
  ggtitle( 'Histogram of Mean Temps Simulated Data' )

what’s a good confidence interval for the temp?

quantile( pars_sample$mu, c( 0.05, 0.95 ) )
##   5%  95% 
## 17.5 23.0

What is the probability that the temp is 18 or greater on July 20th?

#feed in the simulated data
pred_temp <- data.frame( 
  'pred_temp' = rnorm( 10000, 
                       mean = pars_sample$mu,
                       sd = pars_sample$sigma ) )

ggplot( pred_temp, aes( x = pred_temp ) ) +
  geom_histogram( bins = 38 ) +
  ggtitle( 'Histogram of Mean Temps Predicted Data' ) +
  geom_vline( xintercept = 18, color = 'red' )

The probability that the temp is >= 18:

sum( pred_temp >= 18 ) / length( pred_temp )
## [1] 7360

Now to try some similar calculations about the Zombie IQ problem

# The IQ of a bunch of zombies
iq <- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 46)
# Defining the parameter grid
pars <- expand.grid(mu = seq(0, 150, length.out = 100), 
                    sigma = seq(0.1, 50, length.out = 100))
# Defining and calculating the prior density for each parameter combination
pars$mu_prior <- dnorm(pars$mu, mean = 100, sd = 100)
pars$sigma_prior <- dunif(pars$sigma, min = 0.1, max = 50)
pars$prior <- pars$mu_prior * pars$sigma_prior
# Calculating the likelihood for each parameter combination
for(i in 1:nrow(pars)) {
  likelihoods <- dnorm(iq, pars$mu[i], pars$sigma[i])
  pars$likelihood[i] <- prod(likelihoods)
}
# Calculate the probability of each parameter combination
pars$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability ) 
sample_indices <- sample( nrow(pars), size = 10000,
    replace = TRUE, prob = pars$probability)
head(sample_indices)
## [1] 1831 2829 2328 1630 2826 3126
# Sample from pars to calculate some new measures
pars_sample <- pars[sample_indices, c("mu", "sigma")]

# Visualize the mean IQ
hist(pars_sample$mu, 100, col = 'blue')

# Calculate quantiles
qtile_mu <- quantile( pars_sample$mu, c( 0.025, 0.5, 0.975 ) )
ggplot( pars_sample, aes( x = mu ) ) +
  geom_histogram( bins = 40 ) +
  ggtitle( 'Histogram of Sampled Mean Temps',
           subtitle = 'median & 95% confidence interval given in red') +
  geom_vline( xintercept = qtile_mu[1], color = 'red' ) +
  geom_vline( xintercept = qtile_mu[2], color = 'red',
              linetype = 'dashed' ) +
  geom_vline( xintercept = qtile_mu[3], color = 'red' )

Therefore, we estimate the mean zombie IQ to be 42 (95% CI: [35, 50])

But how smart will the next zombie be?
Calculate the probability that the next zombie you’ll meet will have an IQ of 60 or more…

pred_iq <- data.frame( 
  'pred_iq' = rnorm(10000, 
                    mean = pars_sample$mu, 
                    sd = pars_sample$sigma ) )

# Visualize pred_iq
ggplot(pred_iq, aes( x = pred_iq ) ) +
  geom_histogram() +
  geom_vline( xintercept = 60, color = 'red' )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Calculate the probability of a zombie being "smart" (+60 IQ)
sum( pred_iq >= 60 ) / length( pred_iq )
## [1] 885

A Practical Tool: BEST

BEST:

  • A Bayesian model developed by John Kruschle
  • Assumes the data come from a t-distribution (another generative model)
  • the t-distribution is similar to the normal. However, it has an additional parameter the degrees of freedom. When df is low, the dist looks a lot like the normal. When df if high, the t-distribution approximates outliers.
  • BEST estimates the mean, standard deviation, and degrees of freedom parameters.
  • Uses Markov chain Monte Carlo method to fit the model
iq <- c( 55, 44, 34, 18, 51, 40, 40, 49, 48, 46 )
fit <- BESTmcmc( iq )
## Waiting for parallel processing to complete...done.
fit
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
##        mean     sd median  HDIlo HDIup  Rhat n.eff
## mu    43.17  3.890  43.26 35.621 50.97 1.000 49386
## nu    28.73 28.461  19.59  1.000 86.04 1.000 15876
## sigma 11.12  3.785  10.53  4.785 18.65 1.001 19888
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.
plot( fit )

# The IQ of zombies on a regular diet and a brain based diet.
iq_brains <- c(44, 52, 42, 66, 53, 42, 55, 57, 56, 51)
iq_regular <- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 46)

# Calculate the mean difference in IQ between the two groups
mean(iq_brains) - mean(iq_regular)
## [1] 9.3
# Fit the BEST model to the data from both groups
best_posterior <- BESTmcmc(iq_brains, iq_regular)
## Waiting for parallel processing to complete...done.
# Plot the model result
plot( best_posterior )

Using the t-distribution as a generative model makes the BEST fit more robust to outliers. Compare the difference of sample means with the BEST model difference

# The IQ of zombies given a regular diet and a brain based diet.
iq_brains <- c(44, 52, 42, 66, 53, 42, 55, 57, 56, 51)
iq_regular <- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 150) # <- Mutant zombie

# Modify the data above and calculate the difference in means
mean(iq_brains) - mean(iq_regular)
## [1] -1.1
# Fit the BEST model to the modified data and plot the result
best_posterior <- BESTmcmc(iq_brains, iq_regular)
## Waiting for parallel processing to complete...done.
plot( best_posterior )

What have you learned? What did we miss?

Bayesian Inference:

  • Data
  • Bayesian Model
    • Generative Model
    • Priors
  • Computational Method