Bayesian data analysis for cognitive science

This page showcases my solutions to exercises from “An Introduction to Bayesian Data Analysis for Cognitive Science” by Bruno Nicenboim, Daniel J. Schad, and Shravan Vasishth, demonstrating my proficiency in R programming. Chapters are separated into different files.

I welcome corrections for any mistakes and am open to discussions to further enhance understanding and application of these concepts.

This page presents R snippet codes for exercise 1.1 to 1.8 in chapter 1 Probability.

Set up

# Set CRAN mirror
options(repos = c(CRAN = "https://cloud.r-project.org"))

# Load required libraries
library(extraDistr)
library(MASS)
library(ggplot2)

#set seed for all data generations to ensure reproducibility
set.seed(3112025)

Exercise 1.1

Given a normal distribution with mean 500 and standard deviation 100, use the pnorm() function to calculate the probability of obtaining values between 200 and 800 from this distribution.

between_200_800_ex1 <- pnorm(800, mean = 500, sd = 100)-pnorm(200, mean = 500, sd = 100)
cat("Probability of obtaining values between 200 and 800 is ", between_200_800_ex1)

## Probability of obtaining values between 200 and 800 is  0.9973002

Exercise 1.2

Calculate the following probabilities. Given a normal distribution with mean 800 and standard deviation 150, what is the probability of obtaining: - a score of 700 or less - a score of 900 or more - a score of 800 or more

#list of probabilities
probs_x1.2 <- c(
  pnorm(700, mean = 800, sd = 150),
  1 - pnorm(900, mean = 800, sd = 150),
  1 - pnorm(800, mean = 800, sd = 150)
)
#list of values
val_x1.2 <- c("<=700", ">=900", ">=800")
#print result
for (i in seq_along(val_x1.2)) {
  cat(paste("Probability a score of ", val_x1.2[i], " is ", probs_x1.2[i]) , "\n")}

## Probability a score of  <=700  is  0.252492537546923 
## Probability a score of  >=900  is  0.252492537546923 
## Probability a score of  >=800  is  0.5

Exercise 1.3

Given a normal distribution with mean 600 and standard deviation 200, what is the probability of obtaining: - a score of 550 or less. - a score between 300 and 800. - a score of 900 or more.

probs_1.3 <- c(pnorm(550, mean = 600, sd = 200),
               pnorm(800, mean = 600, sd = 200) - pnorm(300, mean = 600, sd = 200),
               1 - pnorm(900, mean = 600, sd = 200))
Val_1.3 <- c("<=500", "between 300 and 800", ">=900")
#print results
for (i in seq_along(Val_1.3)) {
  cat(paste("Probability of obtaining a score ", Val_1.3[i], " is ", probs_1.3[i]), "\n")
}

## Probability of obtaining a score  <=500  is  0.401293674317076 
## Probability of obtaining a score  between 300 and 800  is  0.774537544799685 
## Probability of obtaining a score  >=900  is  0.0668072012688581

Exercise 1.4

Consider a normal distribution with mean 1 and standard deviation 1. Compute the lower and upper boundaries such that: - the area (the probability) to the left of the lower boundary is 0.10. - the area (the probability) to the left of the upper boundary is 0.90.

#the area (the probability) to the left of the lower boundary is 0.10.
lower_boundary_x1.4 <- qnorm(0.10, mean = 1, sd = 1)
#the area (the probability) to the left of the upper boundary is 0.90
upper_boundary_x1.4 <- qnorm(0.90, mean = 1, sd = 1)
#print results
cat(paste("The probability of lower boundary to the left is ", lower_boundary_x1.4, " and the upper boundary to the left is ", upper_boundary_x1.4))

## The probability of lower boundary to the left is  -0.281551565544601  and the upper boundary to the left is  2.2815515655446

Exercise 1.5

Given a normal distribution with mean 650 and standard deviation 125. There exist two quantiles, the lower quantile q1 and the upper quantile q2, that are equidistant from the mean 650, such that the area under the curve of the normal between q1 and q2 is 80%. Find q1 and q2.

q1_1.5 <- sprintf("%.2f",(qnorm(0.10, mean = 650, sd = 125)))
q2_1.5 <- sprintf("%.2f",(qnorm(0.90, mean = 650, sd = 125)))
#print
cat(paste("The q1 is ", q1_1.5, "and the q2 is ", q2_1.5))

## The q1 is  489.81 and the q2 is  810.19

Exercise 1.6

Given data that is generated as follows: data_gen1 <- rnorm(1000, mean = 300, sd = 200) Calculate the mean, variance, and the lower quantile q1 and the upper quantile q2, that are equidistant and such that the range of probability between them is 80%.

data_1.6 <- rnorm(1000, mean = 300, sd = 200) #I rename the variable to follow my documentation structure in this file

mean_1.6 <- round(mean(data_1.6),2) #use round because sprints converts numeric to character
sd_1.6 <- round(sd(data_1.6), 2)
quantiles_1.6 <- sprintf("%.2f", quantile(data_1.6, probs = c(0.10, 0.90)))
#print results
cat(paste("The generated data has mean ", mean_1.6, 
          ", standard deviation ", sd_1.6, 
          ", upper quantiles ", quantiles_1.6[1],
          ", and lower quantiles ", quantiles_1.6[2])) #cat only prints first value or string

## The generated data has mean  301.24 , standard deviation  195.64 , upper quantiles  57.34 , and lower quantiles  564.74

Exercise 1.7

This time we generate the data with a truncated normal distribution from the package extraDistr. The details of this distribution will be discussed later in section 4.1 and in the Box 4.1, but for now we can treat it as an unknown generative process: data_gen2 <- rtnorm(1000, mean = 300, sd = 200, a = 0) Using the sample data, calculate the mean, variance, and the lower quantile q1 and the upper quantile q2, such that the probability of observing values between these two quantiles is 80%.

data_1.7 <- rtnorm(1000, mean = 300, sd = 200, a = 0) #I rename the variable to follow my documentation structure in this file

mean_1.7 <- round(mean(data_1.7),2)
var_1.7 <- round(var(data_1.7),2)
quantiles_1.7 <- sprintf("%.2f", quantile(data_1.7, probs = c(0.10, 0.90)))
#print results
cat("data_1.7 has mean ", mean_1.7,
    ", variance ", var_1.7,
    ", lower quantile ", quantiles_1.7[1],
    " and upper quantile ", quantiles_1.7[2])

## data_1.7 has mean  325.11 , variance  32416.47 , lower quantile  88.74  and upper quantile  572.92

Exercise 1.8

Suppose that you have a bivariate distribution where one of the two random variables comes from a normal distribution with mean μX=600 and standard deviation σX=100, and the other from a normal distribution with mean μY=400 and standard deviation σY=50. The correlation ρXY between the two random variables is 0.4. Write down the variance-covariance matrix of this bivariate distribution as a matrix (with numerical values, not mathematical symbols), and then use it to generate 100 pairs of simulated data points. Plot the simulated data such that the relationship between the random variables X and Y is clear. Generate two sets of new data (100 pairs of data points each) with correlation -0.4 and 0, and plot these alongside the plot for the data with correlation 0.4.

#given parameters
mean_X_1.8 <- 600
sd_X_1.8<- 100
mean_Y_1.8 <- 400
sd_Y_1.8 <- 50
corr_values1.8 <- c(0.4,0,-0.4)
mu_1.8 <- c(600,400) #mean X and mean Y

df_folder1.8 <- list() #creating an empty list to hold all dataset simulated over correlation values

for (corr in corr_values1.8){
  #variance-covariance matrix
  sigma_matrix_1.8 <- matrix(c(sd_X_1.8^2, sd_X_1.8 * sd_Y_1.8 * corr, sd_X_1.8 * sd_Y_1.8 * corr, sd_Y_1.8^2), 
                         ncol = 2, byrow = FALSE)
  #generating 100 pairs of data
  data_1.8 <- mvrnorm(n = 100, mu = mu_1.8, Sigma = sigma_matrix_1.8) 
  #converting to data frame
  df_1.8 <- data.frame(X = data_1.8[,1], Y = data_1.8[,2], Correlation = as.character(corr)) 
  #collecting all values into one place df_folder1.8
  df_folder1.8 [[as.character(corr)]] <- df_1.8 
}

df_stacked1.8 <- do.call(rbind, df_folder1.8) #stacking the values in df_folders into an organised list

#plot
ggplot(df_stacked1.8, aes(x = X, y = Y, color = Correlation)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = lm, se = FALSE) +
  theme_linedraw() +
  facet_wrap(~Correlation)

## `geom_smooth()` using formula = 'y ~ x'