library( dplyr )
library( ggplot2 )
library( gridExtra )
library( tidyverse )
library( ggridges )
library( ggExtra )
library( lattice )
library( RColorBrewer )
library( BEST )
What is Bayesian Inference Work?
A First Taste of Bayes
Bayesian Inference in a Nutshell:
A method for figuring out unobservable quantities given known facts that uses probability to describe the uncertainty over what the values of the unknown quantities could be.
A very interesting summary of Alan Turing’s use of Bayesian inference to crack the enigma code is given. (use the decrypted message to find the probable setting of the enigma machine wheels) A more full treatment of the topic can be found here: Alan Turing and Enigma Statistics
Bayesian Data Analysis:
- The use of bayesian inference to learn from data (the known facts) inference about parameters.
- Can be used for hypothesis testing, linear regressin, etc.
- It’s real power: Is flexible and allows you to construct problem-specific models
Bayesian Data Analysis: a tool to make sense of your data
Let’s Try Some Bayesian Data Analysis
Bayesian inferece == Probabilistic inference.
Probability:
- A number from 0 to 1
- A statement about the certainty / uncertainty of an event
- 1 is complete certainty that a case is
TRUE
- 0 is complete certainty that a case is
FALSE
- Not only for binary yes / no events, but can be applied to continuous variables
The role of probability distributions in Bauesian data analysis is to represent uncertainty, and the role of Bayesian inference is to update probability distributions to reflect what has been learned from data.
Let’s look at a brief illustrative example,
A Bayesian model for the proportion of success
ex: What is the proportion ofsuccessful treatments with a new drug?
prop_model( data )
data
is a vector of failures represented by 1s and 0s- There is an unknown underlying proportion of success
- If the data point is a success is only affected by the proportion of success
- Prior to seeing any data, any underlying proportion of success is equally likely
- The result is a probability distribution that represents what the model knows about the underlying proportion of success
The output of prop_model is a plot showing what the model learns about the underlying proportion of success from each data point in the order you entered them. At n=0 there is no data, and all the model knows is that it’s equally probable that the proportion of success is anything from 0% to 100%. At n=4 all data has been added, and the model knows a little bit more.
will build toward implementing prop_model()
during this course.
…but here is the code from the course author Rasmus Bååth
#The prop_model function - Rasmus Bååth R code
# This function takes a number of successes and failuers coded as a TRUE/FALSE
# or 0/1 vector. This should be given as the data argument.
# The result is a visualization of the how a Beta-Binomial
# model gradualy learns the underlying proportion of successes
# using this data. The function also returns a sample from the
# posterior distribution that can be further manipulated and inspected.
# The default prior is a Beta(1,1) distribution, but this can be set using the
# prior_prop argument.
# Make sure the packages tidyverse and ggridges are installed, otherwise run:
# install.packages(c("tidyverse", "ggridges"))
# Example usage:
# data <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# prop_model(data)
<- function(data = c(), prior_prop = c(1, 1), n_draws = 10000,
prop_model gr_name="Proportion graph") {
#library(tidyverse)
<- as.logical(data)
data # data_indices decides what densities to plot between the prior and the posterior
# For 20 datapoints and less we're plotting all of them.
<- round(seq(0, length(data), length.out = min(length(data) + 1, 40)))
data_indices
# dens_curves will be a data frame with the x & y coordinates for the
# denities to plot where x = proportion_success and y = probability
<- c(0, seq(0, 1, length.out = 100), 1)
proportion_success <- map_dfr(data_indices, function(i) {
dens_curves <- ifelse(i == 0, "Prior", ifelse(data[i], "Success", "Failure"))
value <- paste0("n=", i)
label <- dbeta(proportion_success,
probability 1] + sum(data[seq_len(i)]),
prior_prop[2] + sum(!data[seq_len(i)]))
prior_prop[<- probability / max(probability)
probability data_frame(value, label, proportion_success, probability)
})# Turning label and value into factors with the right ordering for the plot
$label <- fct_rev(factor(dens_curves$label, levels = paste0("n=", data_indices )))
dens_curves$value <- factor(dens_curves$value, levels = c("Prior", "Success", "Failure"))
dens_curves
<- paste("Prior likelihood distribution Beta(a =",
graph_label as.character(prior_prop[1]),", b =",
as.character(prior_prop[2]),")")
<- ggplot(dens_curves, aes(x = proportion_success, y = label,
p height = probability, fill = value)) +
::geom_density_ridges(stat="identity", color = "white", alpha = 0.8,
ggridgespanel_scaling = TRUE, size = 1) +
scale_y_discrete("", expand = c(0.01, 0)) +
scale_x_continuous("Proportion of success") +
scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = "", drop = FALSE,
labels = c("Prior ", "Success ", "Failure ")) +
ggtitle(paste0(gr_name, ": ", sum(data), " successes, ", sum(!data), " failures"),
subtitle = graph_label) +
labs(caption = "based on Rasmus Bååth R code") +
theme_light() +
theme(legend.position = "top")
print(p)
# Returning a sample from the posterior distribution that can be further
# manipulated and inspected
<- rbeta(n_draws, prior_prop[1] + sum(data), prior_prop[2] + sum(!data))
posterior_sample invisible(posterior_sample)
}
Zombie apocalypse: let us say there is a new drug to treat zombieism. It has never been tested before, but the results from this pilot test have two recoveries out of 4 subjects. Here, prop_model
plots the successive posterior probabilities that the zombie antidote will be successful:
<- c( 1, 0, 0, 1 )
data prop_model( data )
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Next we observe several more subjects, however, they are all failures, leaving just 2 successes out of 13 subjects. Here is how the posterior probability changes with the new case information:
<- c( 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
data prop_model( data )
What if we observe a lot more?
<- rbinom( 50, 1, 0.2 )
data prop_model( data )
Compare this experimental zombie drug’s success with that of the currently used drug which has a 7% success rate:
<- rbinom( 50, 1, 0.07 )
data2 <- prop_model( data2 ) posterior2
head( posterior2 )
## [1] 0.07063968 0.10491471 0.02643282 0.02516204 0.04652902 0.10972237
Subjective obseration time: the more information collected about a variable, the closer the observed prior distribution approaches the ‘true’ probability of a success.
- Take a prior probability distribution use observations from the data to update a posterior probability distribution
- prior - a probability distribution that represents what the model knows before seeing the data
- posterior - a probability distribution that represents what the model knows after having seen the data.
= c(1, 0, 0, 1, 0, 0,
data 0, 0, 0, 0, 0, 0, 0)
# Extract and explore the posterior
<- prop_model( data ) posterior
head( posterior )
## [1] 0.1633159 0.1470803 0.2724994 0.1360496 0.1207068 0.1345991
plot the posterior distribution:
hist( posterior, breaks = 30, xlim = c( 0,1 ), col = 'palegreen4' )
Calculate a few descriptive statistics for the posterior distribution:
# Calculate the median
median(posterior)
## [1] 0.1850544
# Calculate the credible 90% interval
quantile(posterior, c(0.05, 0.95))
## 5% 95%
## 0.06265585 0.38235821
# Calculate the probability that the success rate is greater than 0.7%
sum( posterior > 0.07 )/length(posterior)
## [1] 0.9317
Now let’s try to generate a similar plot with ggplot:
<- data.frame( 'posterior' = posterior )
post_df
ggplot( post_df, aes( x = posterior ) ) +
geom_histogram( fill = 'palegreen4', color = "#e9ecef" ) +
ggtitle( 'Posterior Probability Distribution' )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
and display some summary statistics:
summary( post_df )
## posterior
## Min. :0.005171
## 1st Qu.:0.124192
## Median :0.185054
## Mean :0.198854
## 3rd Qu.:0.260246
## Max. :0.671825
How to interpret these summary statistics?
“Given the data of two cured and 11 relapsed zombies, there is a 90% probability that the drug cures between 6% and 39% of treated zombies. Further, there is a 93% probability that this new drug is more effective that the current zombie drug.”
Compare the 2 posterior probability distributions in an overlapped histogram:
<- data.frame( 'posterior' = posterior, 'posterior2' = posterior2 )
post_df <- post_df %>%
post_long pivot_longer( cols = everything(), names_to = 'dist', values_to = 'prob' ) %>%
ggplot( aes( x = prob, fill = dist ) ) +
geom_histogram( color = "#e9ecef" ) +
scale_fill_manual(values=c("palegreen4", "#404080"))
post_long
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
How does Bayesian Inference Work?
The Parts Needed for Bayesian Inference
Bayesian Inference:
- \(\mbox{Bayesian Inference } = \mbox{Data } + \mbox{ a Generative Model } + \mbox{ Priors }\)
- Generative Model: can be a computer program, mathematical expression or a set of rules that can be fed parameters values to be used to simulate data.
#Paramters
<- 0.15
prop_success <- 13
n_zombies #simulate the data with rbinom()
<- rbinom( n_zombies, 1, prop_success )
data data
## [1] 0 0 0 1 0 0 0 0 0 0 0 0 0
This similar function was presented in the course video:
<- c()
data for( zombie in 1 : n_zombies ) {
<- runif( 1, min = 0, max = 1 ) < prop_success
data[ zombie ]
}<- as.numeric( data )
data data
## [1] 0 0 0 0 0 0 1 0 0 0 1 0 0
This function gives an idea of the logic for the prevous code. However, using rbinom()
is more efficient. Now run the generative model where the success is estimated at 42% and the test is done with 100 zombies.
<- 0.42
prop_success <- 100
n_zombies #simulate the data with rbinom()
<- rbinom( 1, n_zombies, prop_success )
data data
## [1] 51
37 zombies got cured in this run of the generative model. How about if it is run 200 times?
<- rbinom( 200, n_zombies, prop_success )
data data
## [1] 46 32 41 41 42 38 48 38 40 44 44 37 38 46 43 43 47 50 41 42 41 42 40 43 42
## [26] 42 31 42 41 45 46 44 47 48 46 44 45 31 42 49 35 46 39 35 42 42 46 47 41 47
## [51] 40 48 35 40 46 40 38 38 45 36 41 35 36 44 51 40 39 40 43 45 46 39 47 47 44
## [76] 42 45 48 34 32 43 42 43 45 39 44 41 42 40 42 45 48 34 46 43 50 43 35 34 42
## [101] 37 40 45 49 38 36 42 45 28 38 43 41 47 33 37 44 34 48 47 32 36 37 41 48 36
## [126] 38 44 37 39 38 43 40 46 41 44 42 34 37 50 46 38 47 35 41 47 42 40 31 47 42
## [151] 44 40 51 45 46 34 40 37 40 40 42 39 44 43 40 35 45 39 41 45 50 39 37 37 47
## [176] 52 44 45 43 52 33 33 47 49 34 43 39 38 46 44 48 39 30 38 43 37 38 37 43 44
Above are the results of 200 simulations of the binomial generative model with the given parameters. Show some summary stats for this distribution…
summary( data )
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 28.00 38.00 42.00 41.53 45.00 52.00
mean( data )
## [1] 41.535
Using a Generative Model
apply our generative model to a relatively large number of simulated trials. The results will for the probability distribution for our process:
<- rbinom( n = 100000, size = 100, prob = 0.07 )
cured_zombies <- data.frame( 'Cured_Zombies' = cured_zombies )
post_df
ggplot( post_df, aes( x = Cured_Zombies ) ) +
geom_histogram( binwidth = 1, fill = 'palegreen4', color = "#e9ecef" ) +
ggtitle( 'Cured Zombies Probability Distribution' )
However, it is more often the case that we have data, and do not need a generative model to create it. However, we do not know the parameters that drive the process. Bayesian Inference is used to invert the probability to work backwards and learn about the underlying process given the data at hand.
New Situations: instead of the zombie example, let’s look at clickthrough rates….
How many visitors to a site?
To get more visitors to your website you are considering paying for an ad to be shown 100 times on a popular social media site. According to the social media site, their ads get clicked on 10% of the time.
# Fill in the parameters
<- 100000
n_samples <- 100
n_ads_shown <- 0.1
proportion_clicks <- rbinom(n_samples, size = n_ads_shown,
n_visitors prob = proportion_clicks)
<- data.frame( 'Site_Visits' = n_visitors )
post_df
ggplot( post_df, aes( x = Site_Visits ) ) +
geom_histogram( binwidth = 1, fill = 'cadetblue', color = "#e9ecef" ) +
ggtitle( 'Number of Visitors Distribution' )
Prepresenting Uncertainty with Priors
And now to address the Priors. Prior Probability Distribution: represents how uncertain the model is about the parameters before seeing any data. Ideally, this uncertainty should reflect our uncertainty. The reason it is called a prior is because it represents the uncertainty before having included information about the data.
You’re not so sure that your ad will get clicked on exactly 10% of the time. Instead of assigning proportion_clicks a single value you are now going to assign it a large number of values drawn from a probability distribution.
<- 100000
n_samples <- 100
n_ads_shown # model a uniform probability distribution that ranges from 0 to 0.2 to represent the uncertainty of our value for this prior
<- runif( n = n_samples, min = 0.0, max = 0.2 )
proportion_clicks <- rbinom( n_samples, n_ads_shown, proportion_clicks ) n_visitors
Visualize the probability distribution for the prior, proportion_clicks.
<- data.frame( 'probability_click' = proportion_clicks )
pclick_df
<- ggplot( pclick_df, aes( x = probability_click ) ) +
pclick_plot geom_histogram( fill = 'cadetblue', color = "#e9ecef" ) +
ggtitle( 'Probability Distribution of the Prior' )
<- data.frame( 'Site_Visits' = n_visitors )
nvis_df
<- ggplot( nvis_df, aes( x = Site_Visits ) ) +
nvis_plot geom_histogram( binwidth = 1, fill = 'cadetblue', color = "#e9ecef" ) +
ggtitle( 'Number of Visitors Distribution' )
grid.arrange( pclick_plot, nvis_plot, ncol = 2 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The envelope of the n-visitors data from this generative model is much different. The added uncertainty for the prior increases the uncertainty in the number of visitors the site will recieve.
Bayesian Models and Conditioning
What if we send our add out in to the wilds of the market place and get some data on it’s performance. Let’s say 13 out of 100 users who were shown the ad followed through and clicked to visit the website. With this data, we can now apply some Bayesian Inference….
#this dataframe represents the joint probability distribution over proportion_clicks and n_visitors together.
<- data.frame( 'proportion_clicks' = proportion_clicks, 'n_visitors' = n_visitors )
prior_df
#visualize as a scatterplot....
<- ggplot( prior_df, aes( x = n_visitors, y = proportion_clicks ) ) +
prior_plot geom_point( alpha = 1/10 ) +
#geom_jitter(alpha = 1/10 ) +
theme( legend.position = 'none' )
<- ggMarginal( prior_plot, type = 'histogram',
prior_plot xparams = list( bins=20 ),
yparams = list( bins=20))
prior_plot
The histograms show the marginal distributions. These are the same probability distributions that were described and visualized in the previous exercises.
There is a clear trend in the data: the higher the underlying proportion of clicks, the more visitors there will be to the website
From this joint probability data, we can come up wih probability distributions for specific conditional situations. For example, If we know a conditional value for proportion_clicks
= 0.10, then we can describe the distribution of n_visitors
<- prior_df %>%
prior10_df mutate( p10 = round( proportion_clicks, 2) == 0.1 ) %>%
filter( p10 == TRUE ) %>%
select( c( n_visitors, proportion_clicks ) )
<- ggplot( prior_df, aes( x = n_visitors, y = proportion_clicks ) ) +
prior_plot geom_point( alpha = 1/10 ) +
geom_point(data = prior10_df, aes(x = n_visitors, y = proportion_clicks, color = 'red' ) ) +
theme( legend.position = 'none' )
<- ggMarginal( prior_plot, type = 'histogram',
prior_plot xparams = list( bins=20 ),
yparams = list( bins=20))
prior_plot
And now to plot this distribution:
ggplot( prior10_df, aes( x = n_visitors ) ) +
geom_histogram( binwidth = 1, fill = 'red', color = "#e9ecef" ) +
ggtitle( 'Probability Distribution of n_visitors given proportion_clicks = 0.1' )
If we shift the
proportion_clicks
, the distribution of n_visitors
shifts accordingly. Additionally, we can condition the proportion_clicks
on a given n_visitors
This demonstrations brings us to the essence of Bayesian Inference:
Bayesian inference is conditioning on data, in order to learn about parameter values
Let’s condition on the data we collected were we got 13 visits from 100 presentations of the ad. Find the posterior distribution for the proportion of clicks given there were 13 visitors. It is called a posterior, because it represents the uncertainty after having included information from the observed data.
head( prior_df )
## proportion_clicks n_visitors
## 1 0.07408206 10
## 2 0.14653305 14
## 3 0.02444900 1
## 4 0.14785947 16
## 5 0.06287937 3
## 6 0.09022803 9
#condition the joint distribution with the observation that there were 13 visitors
# Create the posterior data frame
<- prior_df[prior_df$n_visitors == 13, ]
posterior
# Visualize posterior proportion clicks
hist( posterior$proportion_clicks )
Now we can use this new estimate of the prior distribution to compute
n_visits
if we rerun the ad campaign:
# Assign posterior to a new variable called prior
<- posterior
prior
# Take a look at the first rows in prior
head(prior)
## proportion_clicks n_visitors
## 27 0.13675733 13
## 41 0.13979790 13
## 56 0.10820342 13
## 85 0.08331700 13
## 89 0.10834601 13
## 103 0.07735876 13
# Replace prior$n_visitors with a new sample and visualize the result
<- nrow(prior)
n_samples <- 100
n_ads_shown $n_visitors <- rbinom(n_samples, size = n_ads_shown,
priorprob = prior$proportion_clicks)
hist(prior$n_visitors)
# Calculate the probability that you will get 5 or more visitors
sum(prior$n_visitors >= 5) / length(prior$n_visitors)
## [1] 0.9878283
From the observation of 13/100 visits, we find that it is very probable (99% probable) that future add campaigns will generate at least 5 visits.
Why use Bayesian Data Analysis?
Four Good Things with Bayes
Bayes is very Flexible!
- Can include information sources in addition to the data
- Background Information
- Expert Opinion
- Common Knowledge
- ex: suppose you ask a vendor what the median and range of success has been for previous ad campaigns.
- Can make many comparisons between groups or data sets
- Can test different experimental conditions
- Can easily find the probable difference between treatment groups
- ex: suppose you have two different treatment groups
- Can use the result of Bayesian Analysis to do Decision Analysis
- Decision Analysis: take the result of a statistical analysis and post-process it to apply to a process of interest.
- the posterior distributions are not of principal interest, rather an outcome to meet a goal (e.g. highest return/clickthrough)
- Can change the underlying statistical model
- if new data indicates the need for a change e.g. from a binomial model to include a new variable(s)
- Bayes is optimal in the smol world that is the model
- In Bayesian data analysis there is a separation between model and computation
Example: suppose you ask a vendor what the median and range of success has been for previous ad campaigns. The vendor responds that most ads get 5% clicks with a range of 2-8%. This information can be incorporated by updating the priors of a bayesian statistical model.
we will use the beta
distibution with rbeta()
as a generative model for our informed pior
# Explore using the rbeta function
<- rbeta(n = 1000000, shape1 = 1, shape2 = 1)
beta_sample
# Visualize the results
hist( beta_sample )
The above distibution is an uninformative prior because it gives an equally likely probability for all values from 0 to 1.
rbeta()
takes two shape parameters that must be positive. The larger the shape1
parameter, the closer the mean of the distribution is the 1. The larger the shape2
parameter, the closer the mean of the distribution is to 0.
# Modify the parameters
<- data.frame( 'dist' = rbeta(n = 1000000, shape1 = 100, shape2 = 20) )
beta_sample1 # Visualize the results
<- ggplot( beta_sample1, aes( x = dist ) ) +
close21 geom_histogram( fill = 'cadetblue', color = "#e9ecef" ) +
ggtitle( 'rbeta(nshape1 = 100, shape2 = 20)' )
# Modify the parameters
<- data.frame( 'dist' = rbeta(n = 1000000, shape1 = 20, shape2 = 100) )
beta_sample2 # Visualize the results
<- ggplot( beta_sample2, aes( x = dist ) ) +
close20 geom_histogram( fill = 'cadetblue', color = "#e9ecef" ) +
ggtitle( 'rbeta(nshape1 = 20, shape2 = 100)' )
grid.arrange( close21, close20, ncol = 2 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
There are many values that could be used to construct a reasonable prior distribution based on the information given to us, but here is a good suggestion:
# Explore using the rbeta function
<- rbeta(n = 1000000, shape1 = 5, shape2 = 95)
beta_sample
# Visualize the results
hist( beta_sample )
Now for some excitement!: Let’s update the model developed in the previous chapter with our new, informed prior distribution for proportion_clicks
Old Prior:
<- 100000
n_draws <- 100
n_ads_shown
# Change the prior on proportion_clicks
<-
proportion_clicks runif(n_draws, min = 0.0, max = 0.2)
<-
n_visitors rbinom(n_draws, size = n_ads_shown,
prob = proportion_clicks)
<-
prior data.frame(proportion_clicks, n_visitors)
<-
posterior $n_visitors == 13, ]
prior[prior
# This plots the prior and the posterior in the same plot
par(mfcol = c(2, 1))
hist(prior$proportion_clicks,
xlim = c(0, 0.25))
hist(posterior$proportion_clicks,
xlim = c(0, 0.25))
New Prior
<- 100000
n_draws <- 100
n_ads_shown
# Change the prior on proportion_clicks
<-
proportion_clicks rbeta(n_draws, shape1 = 5, shape2 = 95)
<-
n_visitors rbinom(n_draws, size = n_ads_shown,
prob = proportion_clicks)
<-
prior data.frame(proportion_clicks, n_visitors)
<-
posterior $n_visitors == 13, ]
prior[prior
# This plots the prior and the posterior in the same plot
par(mfcol = c(2, 1))
hist(prior$proportion_clicks,
xlim = c(0, 0.25))
hist(posterior$proportion_clicks,
xlim = c(0, 0.25))
The
proportion_clicks
prior is Dead! Long live the proportion_clicks
prior!
Note the change in the prior distribution, but also the shift towards the left of the posterior. Indeed, the information given in the informaed prior tames the more enthusiastic response in the small data sample we have recieved with 13% clicks. The posterior distribution is shaped by both the observed data and the prior information. Which posterior should we use? The naive model favors the data we have collected. Whereas the informed posterior incorporates the outside information we have gathered. If the information is correct, then the informed posterior should lead to beter estimates of future outcomes.
Contrasts and Comparisons
Comparing Video and Text ads
<- data.frame( 'video_prop' = rbeta(n = 1000000, shape1 = 13, shape2 = 87 ),
posterior 'text_prop' = rbeta(n = 1000000, shape1 = 6, shape2 = 94 ) ) %>%
mutate( 'prop_diff' = video_prop - text_prop )
head( posterior )
## video_prop text_prop prop_diff
## 1 0.12683939 0.08997656 0.036862837
## 2 0.14732568 0.07814926 0.069176427
## 3 0.11575964 0.10835300 0.007406641
## 4 0.09275093 0.06422393 0.028526996
## 5 0.12079666 0.03355893 0.087237732
## 6 0.22117221 0.03616603 0.185006184
The new feature, prop_diff
describes the probability difference between the two conditions
visualize:
<- posterior %>%
posterior_long pivot_longer( cols = everything(), names_to = 'dist', values_to = 'val' )
ggplot(posterior_long ,
aes(x=val, fill=dist)) + geom_histogram(alpha=0.4, position="identity")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Define parameters
<- 100000
n_draws <- 100
n_ads_shown <- runif(n_draws, min = 0.0, max = 0.2)
proportion_clicks <- rbinom(n = n_draws, size = n_ads_shown,
n_visitors prob = proportion_clicks)
<- data.frame(proportion_clicks, n_visitors)
prior
# Create the posteriors for video and text ads
<- prior[prior$n_visitors == 13, ]
posterior_video <- prior[prior$n_visitors == 6, ]
posterior_text
# Visualize the posteriors
hist(posterior_video$proportion_clicks, xlim = c(0, 0.25))
hist(posterior_text$proportion_clicks, xlim = c(0, 0.25))
<- data.frame(
posterior video_prop = posterior_video$proportion_clicks[1:4000],
text_prop = posterior_text$proportion_click[1:4000])
# Calculate the posterior difference: video_prop - text_prop
$prop_diff = posterior$video_prop - posterior$text_prop
posterior
# Visualize prop_diff
hist(posterior$prop_diff, xlim = c(0, 0.25))
# Calculate the median of prop_diff
median(posterior$prop_diff)
## [1] 0.06738542
# Calculate the proportion
# That is, calculate the proportion of samples in posterior$prop_diff that are more than zero.
sum( posterior$prop_diff > 0 ) / length( posterior$prop_diff )
## [1] 0.9515
Decision Analysis
Bayesian data analysis makes it pretty easy to compare and contrast parameter estimates.
Decision Analysis: using the posteriors to apply to a process. e.g.: now that we know the posteriors, which will be more lucrative video or text add campaign?
<- 0.25 #per video ad
video_cost <- 0.05 #per text ad
text_cost <- 2.53 #return for each visitor to site
visitor_spend
$video_profit <- posterior$video_prop * visitor_spend - video_cost
posterior$text_profit <- posterior$text_prop * visitor_spend - text_cost
posterior$profit_diff <- posterior$video_profit - posterior$text_profit
posterior
glimpse( posterior )
## Rows: 4,000
## Columns: 6
## $ video_prop <dbl> 0.09093018, 0.18179948, 0.17805112, 0.11829070, 0.143525…
## $ text_prop <dbl> 0.05785093, 0.08097478, 0.04782950, 0.05842797, 0.060830…
## $ prop_diff <dbl> 0.033079259, 0.100824699, 0.130221624, 0.059862734, 0.08…
## $ video_profit <dbl> -0.019946634, 0.209952674, 0.200469343, 0.049275476, 0.1…
## $ text_profit <dbl> 0.09636284, 0.15486619, 0.07100863, 0.09782276, 0.103901…
## $ profit_diff <dbl> -0.116309475, 0.055086488, 0.129460709, -0.048547283, 0.…
Which type of add will give the most money?
<- posterior %>%
posterior_long select( c( video_profit, text_profit ) ) %>%
pivot_longer( cols = everything(), names_to = 'dist', values_to = 'val' )
ggplot(posterior_long ,
aes(x=val, fill=dist)) + geom_histogram(alpha=0.4, position="identity")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot( posterior, aes( x = profit_diff ) ) +
geom_histogram() +
geom_vline( xintercept = 0, color = 'red' ) +
geom_vline( xintercept = median( posterior$profit_diff ),
color = 'blue' ) +
annotate( geom = 'text',
x = median( posterior$profit_diff ) - 0.01,
y = 150, label = 'Median Value',
color = 'blue', angle = 90 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Calculate a "best guess" for the difference in profits
median(posterior$profit_diff)
## [1] -0.02951489
# Calculate the probability that text ads are better than video ads
sum( posterior$profit_diff < 0 ) / length( posterior$profit_diff )
## [1] 0.619
In conclusion, even though text ads get a lower proportion of clicks, they are also much cheaper. And, as you have calculated, there is a 63% probability that text ads are better.
Change Anything and Everything
Let us say that we can also show a banner ad on a website. We have tested this out and received 19 clicks for a 24 hour period. What is a good generative model to use here?
Poisson Distribution:
- takes 1 parameter: the mean number of events per time unit
rpois
samples the Poisson distibution
#let's say the mean clicks is 20/day
<- rpois( n = 100000, lambda = 20 )
n_clicks hist( n_clicks )
Explore the Poisson Distribution more before we move on with Bayes
# Simulate from a Poisson distribution and visualize the result
<- rpois(n = 10000, lambda = 3)
x hist( x )
#Let's say that you run an ice cream stand and on cloudy days you on average sell 11.5 ice creams. It's a cloudy day.
<- rpois(n = 10000, lambda = 11.5)
x hist( x )
#It's still a cloudy day, and unfortunately, you won't break even unless you sell 15 or more ice creams.
# Calculate the probability of break-even
sum( x >= 15 )/length( x )
## [1] 0.1851
Let’s change the generative model to a poisson mod:
# Change the model according to instructions
<- 100000
n_draws <- runif(n_draws, min = 0, max = 80)
mean_clicks <- rpois(n = n_draws, mean_clicks)
n_visitors
<- data.frame(mean_clicks, n_visitors)
prior <- prior[prior$n_visitors == 19, ]
posterior
# Visualize mean_clicks
hist( prior$mean_clicks )
hist( posterior$mean_clicks)
Bayes is Optimal, kind of….
Bayes is optimal in the small world of the model.
Bayesian Inference with Bayes’ Theorem
Probability Rules
fitting Bayesian models more efficiently
Probability Theory
- Probability
- a number between 0 and 1
- A statement of certainty/uncertainty
- Mathematical Notation
- P(n = 1) is a probability
- P(n) is a probability distribution
- P(n = 1|m = 1) is a conditional probability
- P(n|m = 1) is a conditional probability distribution
- Manipulating (independent) Probabilities
- The Sum Rule
- p( 1 or 2 or 2 ) = 1/6 + 1/6 + 1/6 = 0.5
- The Product Rule
- p( 6 and 6 ) = 1/6 * 1/6 = 1/36 = 2.8%
- The Sum Rule
Calculating Likelihoods
we simulated using the rbinom
, rpois
etc. functions. now we can calculate using the d
functions like dbinom
and pois
#calculate P(n_visitors = 13 | prob_success = 10%)
dbinom( 13, size = 100, prob = 0.1)
## [1] 0.07430209
#calculate P(n_visitors = 13 or n_visitors = 14 | prob_success = 10 )
dbinom( 13, size = 100, prob = 0.1 ) + dbinom( 14, size = 100, prob = 0.1 )
## [1] 0.1256059
Calculating probability distributions
#calculate P( n_visitors | prob_success = 0.1 )
<- seq( 0, 100, by = 1 )
n_visitors <- dbinom( n_visitors, size = 100, prob = 0.1 )
probability plot( n_visitors, probability, type = 'h' )
Probability Density.. Continuous Distributions
dunif( x = 0.12, min = 0, max = 0.2 )
## [1] 5
# Explore using dbinom to calculate probability distributions
<- 100
n_ads_shown <- 0.1
proportion_clicks <- seq(0, 100)
n_visitors <- dbinom(n_visitors,
prob size = n_ads_shown, prob = proportion_clicks)
# Plot the distribution
plot( n_visitors, prob, type = 'h')
# Change the code according to the instructions
<- 100
n_ads_shown <- seq(0, 1, by = 0.01)
proportion_clicks <- 13
n_visitors <- dbinom(n_visitors,
prob size = n_ads_shown, prob = proportion_clicks)
plot(proportion_clicks, prob, type = "h")
Bayesian Calculation
Bayesian Inference by Calculation. Instead of simulating probabilities, we will directly calculate them in R
.
<- 100
n_ads_shown <- seq( 0, 100, by = 1 )
n_visitors <- seq( 0, 1, 0.01 )
proportion_clicks <- expand.grid( proportion_clicks = proportion_clicks,
pars n_visitors = n_visitors )
$prior <- dunif( pars$proportion_clicks, min = 0, max = 0.2 )
pars$likelihood <- dbinom( pars$n_visitors, size = n_ads_shown,
parsprob = pars$proportion_clicks)
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )
pars
glimpse( pars )
## Rows: 10,201
## Columns: 5
## $ proportion_clicks <dbl> 0.00, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.0…
## $ n_visitors <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ prior <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ likelihood <dbl> 1.000000e+00, 3.660323e-01, 1.326196e-01, 4.755251e…
## $ probability <dbl> 4.761905e-02, 1.743011e-02, 6.315217e-03, 2.264405e…
ggplot( pars, aes( x = n_visitors, y = proportion_clicks,
fill = probability ) ) +
geom_tile() +
xlim( c( 0, 35 ) ) +
ylim( c( 0, 0.2 ) ) +
scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 9445 rows containing missing values (geom_tile).
<- pars[ pars$n_visitors == 13, ]
pars13 $probability <- pars13$probability / sum( pars$ probability )
pars13
ggplot( pars13, aes( x = n_visitors, y = proportion_clicks,
fill = probability ) ) +
geom_tile() +
xlim( c( 0, 35 ) ) +
ylim( c( 0, 0.2 ) ) +
scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 80 rows containing missing values (geom_tile).
ggplot( pars13, aes( x = proportion_clicks, y = probability ) ) +
geom_area() +
xlim( c( 0, 0.3 ) )
## Warning: Removed 70 rows containing missing values (position_stack).
A conditional shortcut: You can directly condition on the data, no need to first create the joint distribution.
# Simplify the code below by directly conditioning on the data
<- 100
n_ads_shown <- seq(0, 1, by = 0.01)
proportion_clicks <- 6
n_visitors <- expand.grid(proportion_clicks = proportion_clicks,
pars n_visitors = n_visitors)
$prior <- dunif(pars$proportion_clicks, min = 0, max = 0.2)
pars$likelihood <- dbinom(pars$n_visitors,
parssize = n_ads_shown, prob = pars$proportion_clicks)
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum(pars$probability)
pars$probability <- pars$probability / sum(pars$probability)
parsplot(pars$proportion_clicks, pars$probability, type = "h")
Bayes’ Theorem
This is an implementation of Bayes’ Theorem:
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability ) pars
Now to write this in proper notation:
\[P(\theta|D) = \frac{\mbox{P}(D|\theta) \cdot \mbox{P}(\theta)}{\sum \mbox{P}(D|\theta) \cdot \mbox{P}(\theta)}\] where:
- \(\theta\) = paramters
- \(D\) = the Data
The probability of the different parameter values given some data equals the likelihood of the relative probability of the data given the different parameter values multiplied by the prior probability of the different parameter values before seeing the data normalized by the total sum of the likelihood weighted by the prior.
In the previous chapter, we used simulation to find the solution. In this chapter we calculated the probability using an algorithm called grid approximation. However, there are many more algorithms to fit Bayesian models, some being more efficient than others.
tilda notation \(\sim \longrightarrow \mbox{is distributed as }\)
More Parameters, More Data, and More Bayes…
The Temperature in a Normal Lake
<- c( 19, 23, 20, 17, 23 )
temp #f = 66, 73, 68, 63, 73
Use the Normal distribution as a generative model for values centered about a mean value.
rnorm( n = 5, mean = 20, sd = 2 )
## [1] 17.59292 22.73173 22.78236 19.48360 19.16987
use dnorm()
to look at how likely an outcome is given some fixed parameters
<- dnorm( x = temp, mean = 20, sd = 2 )
like like
## [1] 0.1760327 0.0647588 0.1994711 0.0647588 0.0647588
the results are the relative likelihoods of the data points.
#log likelihood
log(like)
## [1] -1.737086 -2.737086 -1.612086 -2.737086 -2.737086
using rnorm
& dnorm
:
# Assign mu and sigma
<- 3500
mu <- 600
sigma
par(mfrow = c( 1,2 ) )
<- rnorm(n = 100000, mean = mu, sd = sigma)
weight_distr <- hist(weight_distr, 60, xlim = c(0, 6000), col = "lightgreen")
p1
# Create weight
<- seq(0, 6000, by = 100)
weight
# Calculate likelihood
<- dnorm(weight, mu, sigma)
likelihood
# Plot the distribution of weight
<- plot( x = weight,
p2 y = likelihood,
type = 'h' )
A Bayesian Model of Water Temperature
so, we have out temps: temp = 19, 23, 20, 17, 23
and we have out generative model: \(\mbox{temp}_i \sim \mbox{Normal}(\mu,\sigma)\)
Now we need to our prior distributions… our informed guess about water temperatures is that we can expect the following standard deviation and mean:
\(\sigma \sim \mbox{Uniform}(\mbox{min: }0,\mbox{max: }10)\)
\(\mu \sim \mbox{Normal}(\mbox{mean: }18,\mbox{sd: }5)\)
we could have used many other priors, but this is our best guess for now.
Let’s fit our new normal model using the grid approximation method we used earlier
<- c( 19, 23, 20, 17, 23 )
temp <- seq( 8, 30, by = 0.5 )
mu <- seq( 0.1, 10, by = 0.3 )
sigma <- expand.grid( mu = mu, sigma = sigma )
pars
plot( pars, pch = 19, main = 'The Parameter Space' )
#the priors
$mu_prior <- dnorm( pars$mu, mean = 18, sd = 5 )
pars$sigma_prior <- dunif( pars$sigma, min = 0, max = 10 )
pars$prior <- pars$mu_prior * pars$sigma_prior
parsfor( i in 1:nrow( pars ) ) {
<- dnorm( temp, pars$mu[i], pars$sigma[i])
likelihoods $likelihood[i] <- prod( likelihoods )
pars
}
#calculate the posterior probability
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )
pars
summary( pars )
## mu sigma mu_prior sigma_prior
## Min. : 8.0 Min. : 0.10 Min. :0.004479 Min. :0.1
## 1st Qu.:13.5 1st Qu.: 2.50 1st Qu.:0.018810 1st Qu.:0.1
## Median :19.0 Median : 5.05 Median :0.043570 Median :0.1
## Mean :19.0 Mean : 5.05 Mean :0.043233 Mean :0.1
## 3rd Qu.:24.5 3rd Qu.: 7.60 3rd Qu.:0.066645 3rd Qu.:0.1
## Max. :30.0 Max. :10.00 Max. :0.079788 Max. :0.1
## prior likelihood probability
## Min. :0.0004479 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.0018810 1st Qu.:4.000e-12 1st Qu.:2.000e-09
## Median :0.0043570 Median :1.972e-08 Median :1.106e-05
## Mean :0.0043233 Mean :4.214e-07 Mean :6.536e-04
## 3rd Qu.:0.0066645 3rd Qu.:1.664e-07 3rd Qu.:2.307e-04
## Max. :0.0079788 Max. :1.175e-05 Max. :1.891e-02
now to visualize:
ggplot( pars, aes( x = mu, y = sigma,
fill = probability ) ) +
geom_tile() +
xlim( c( 10, 30 ) ) +
ylim( c( 0, 10 ) ) +
scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 136 rows containing missing values (geom_tile).
From this result, we would expect the most probable temperature to be around 19-22 degrees C.
Now to try with the zombie exercise
# The IQ of a bunch of zombies
<- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 46)
iq # Defining the parameter grid
<- expand.grid(mu = seq(0, 150, length.out = 100),
pars sigma = seq(0.1, 50, length.out = 100))
# Defining and calculating the prior density for each parameter combination
$mu_prior <- dnorm(pars$mu, mean = 100, sd = 100)
pars$sigma_prior <- dunif(pars$sigma, min = 0.1, max = 50)
pars$prior <- pars$mu_prior * pars$sigma_prior
pars# Calculating the likelihood for each parameter combination
for(i in 1:nrow(pars)) {
<- dnorm(iq, pars$mu[i], pars$sigma[i])
likelihoods $likelihood[i] <- prod(likelihoods)
pars
}# Calculate the probability of each parameter combination
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )
pars
# Visualize
ggplot( pars, aes( x = mu, y = sigma,
fill = probability ) ) +
geom_tile() +
xlim( c( 30, 60 ) ) +
ylim( c( 5, 20 ) ) +
scale_fill_gradient(low="white", high="darkblue")
## Warning: Removed 9400 rows containing missing values (geom_tile).
Generate the same visualization as in the course using the
levelplot
from the lattice
library
<- colorRampPalette(brewer.pal(8, "PuRd"))(25)
coul levelplot(probability ~ mu * sigma, data = pars, , col.regions = coul)
From this visualization we see that a credible interval of Zombie IQ is a mean of 42 +/- 10
Answering the Question: Should I have a beach party?
Revisiting the lake:
- What’s likely the average water temperature on 20th of July?
- What’s the probability that the water temperture is going to be 18 or more on the nest 20th of July?
<- c( 19, 23, 20, 17, 23 )
temp <- seq( 8, 30, by = 0.5 )
mu <- seq( 0.1, 10, by = 0.3 )
sigma <- expand.grid( mu = mu, sigma = sigma )
pars #the priors
$mu_prior <- dnorm( pars$mu, mean = 18, sd = 5 )
pars$sigma_prior <- dunif( pars$sigma, min = 0, max = 10 )
pars$prior <- pars$mu_prior * pars$sigma_prior
parsfor( i in 1:nrow( pars ) ) {
<- dnorm( temp, pars$mu[i], pars$sigma[i])
likelihoods $likelihood[i] <- prod( likelihoods )
pars
}
#calculate the posterior probability
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )
pars
summary( pars )
## mu sigma mu_prior sigma_prior
## Min. : 8.0 Min. : 0.10 Min. :0.004479 Min. :0.1
## 1st Qu.:13.5 1st Qu.: 2.50 1st Qu.:0.018810 1st Qu.:0.1
## Median :19.0 Median : 5.05 Median :0.043570 Median :0.1
## Mean :19.0 Mean : 5.05 Mean :0.043233 Mean :0.1
## 3rd Qu.:24.5 3rd Qu.: 7.60 3rd Qu.:0.066645 3rd Qu.:0.1
## Max. :30.0 Max. :10.00 Max. :0.079788 Max. :0.1
## prior likelihood probability
## Min. :0.0004479 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.0018810 1st Qu.:4.000e-12 1st Qu.:2.000e-09
## Median :0.0043570 Median :1.972e-08 Median :1.106e-05
## Mean :0.0043233 Mean :4.214e-07 Mean :6.536e-04
## 3rd Qu.:0.0066645 3rd Qu.:1.664e-07 3rd Qu.:2.307e-04
## Max. :0.0079788 Max. :1.175e-05 Max. :1.891e-02
<- sample( 1:nrow( pars ), size = 10000,
sample_indices replace = TRUE, prob = pars$probability )
head( sample_indices )
## [1] 614 343 604 387 557 1156
<- pars[sample_indices, c( 'mu', 'sigma' ) ]
pars_sample head( pars_sample )
## mu sigma
## 614 22.0 4.0
## 343 21.5 2.2
## 604 17.0 4.0
## 387 21.0 2.5
## 557 16.0 3.7
## 1156 23.0 7.6
now that we have generated samples from the distribution, let’s plots them:
ggplot( pars_sample, aes( x = mu ) ) +
geom_histogram( bins = 38 ) +
ggtitle( 'Histogram of Mean Temps Simulated Data' )
what’s a good confidence interval for the temp?
quantile( pars_sample$mu, c( 0.05, 0.95 ) )
## 5% 95%
## 17.5 23.0
What is the probability that the temp is 18 or greater on July 20th?
#feed in the simulated data
<- data.frame(
pred_temp 'pred_temp' = rnorm( 10000,
mean = pars_sample$mu,
sd = pars_sample$sigma ) )
ggplot( pred_temp, aes( x = pred_temp ) ) +
geom_histogram( bins = 38 ) +
ggtitle( 'Histogram of Mean Temps Predicted Data' ) +
geom_vline( xintercept = 18, color = 'red' )
The probability that the temp is >= 18:
sum( pred_temp >= 18 ) / length( pred_temp )
## [1] 7360
Now to try some similar calculations about the Zombie IQ problem
# The IQ of a bunch of zombies
<- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 46)
iq # Defining the parameter grid
<- expand.grid(mu = seq(0, 150, length.out = 100),
pars sigma = seq(0.1, 50, length.out = 100))
# Defining and calculating the prior density for each parameter combination
$mu_prior <- dnorm(pars$mu, mean = 100, sd = 100)
pars$sigma_prior <- dunif(pars$sigma, min = 0.1, max = 50)
pars$prior <- pars$mu_prior * pars$sigma_prior
pars# Calculating the likelihood for each parameter combination
for(i in 1:nrow(pars)) {
<- dnorm(iq, pars$mu[i], pars$sigma[i])
likelihoods $likelihood[i] <- prod(likelihoods)
pars
}# Calculate the probability of each parameter combination
$probability <- pars$likelihood * pars$prior
pars$probability <- pars$probability / sum( pars$probability )
pars<- sample( nrow(pars), size = 10000,
sample_indices replace = TRUE, prob = pars$probability)
head(sample_indices)
## [1] 1831 2829 2328 1630 2826 3126
# Sample from pars to calculate some new measures
<- pars[sample_indices, c("mu", "sigma")]
pars_sample
# Visualize the mean IQ
hist(pars_sample$mu, 100, col = 'blue')
# Calculate quantiles
<- quantile( pars_sample$mu, c( 0.025, 0.5, 0.975 ) ) qtile_mu
ggplot( pars_sample, aes( x = mu ) ) +
geom_histogram( bins = 40 ) +
ggtitle( 'Histogram of Sampled Mean Temps',
subtitle = 'median & 95% confidence interval given in red') +
geom_vline( xintercept = qtile_mu[1], color = 'red' ) +
geom_vline( xintercept = qtile_mu[2], color = 'red',
linetype = 'dashed' ) +
geom_vline( xintercept = qtile_mu[3], color = 'red' )
Therefore, we estimate the mean zombie IQ to be 42 (95% CI: [35, 50])
But how smart will the next zombie be?
Calculate the probability that the next zombie you’ll meet will have an IQ of 60 or more…
<- data.frame(
pred_iq 'pred_iq' = rnorm(10000,
mean = pars_sample$mu,
sd = pars_sample$sigma ) )
# Visualize pred_iq
ggplot(pred_iq, aes( x = pred_iq ) ) +
geom_histogram() +
geom_vline( xintercept = 60, color = 'red' )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Calculate the probability of a zombie being "smart" (+60 IQ)
sum( pred_iq >= 60 ) / length( pred_iq )
## [1] 885
A Practical Tool: BEST
BEST:
- A Bayesian model developed by John Kruschle
- Assumes the data come from a t-distribution (another generative model)
- the t-distribution is similar to the normal. However, it has an additional parameter the degrees of freedom. When df is low, the dist looks a lot like the normal. When df if high, the t-distribution approximates outliers.
- BEST estimates the mean, standard deviation, and degrees of freedom parameters.
- Uses Markov chain Monte Carlo method to fit the model
<- c( 55, 44, 34, 18, 51, 40, 40, 49, 48, 46 )
iq <- BESTmcmc( iq ) fit
## Waiting for parallel processing to complete...done.
fit
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
## mean sd median HDIlo HDIup Rhat n.eff
## mu 43.17 3.890 43.26 35.621 50.97 1.000 49386
## nu 28.73 28.461 19.59 1.000 86.04 1.000 15876
## sigma 11.12 3.785 10.53 4.785 18.65 1.001 19888
##
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.
plot( fit )
# The IQ of zombies on a regular diet and a brain based diet.
<- c(44, 52, 42, 66, 53, 42, 55, 57, 56, 51)
iq_brains <- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 46)
iq_regular
# Calculate the mean difference in IQ between the two groups
mean(iq_brains) - mean(iq_regular)
## [1] 9.3
# Fit the BEST model to the data from both groups
<- BESTmcmc(iq_brains, iq_regular) best_posterior
## Waiting for parallel processing to complete...done.
# Plot the model result
plot( best_posterior )
Using the t-distribution as a generative model makes the BEST fit more robust to outliers. Compare the difference of sample means with the BEST model difference
# The IQ of zombies given a regular diet and a brain based diet.
<- c(44, 52, 42, 66, 53, 42, 55, 57, 56, 51)
iq_brains <- c(55, 44, 34, 18, 51, 40, 40, 49, 48, 150) # <- Mutant zombie
iq_regular
# Modify the data above and calculate the difference in means
mean(iq_brains) - mean(iq_regular)
## [1] -1.1
# Fit the BEST model to the modified data and plot the result
<- BESTmcmc(iq_brains, iq_regular) best_posterior
## Waiting for parallel processing to complete...done.
plot( best_posterior )
What have you learned? What did we miss?
Bayesian Inference:
- Data
- Bayesian Model
- Generative Model
- Priors
- Computational Method