Week05 Lab

Q1a. Imagine a bird species with mean clutch size of 3.8 eggs per nest. Assuming egg number is Poisson-distributed, what is the probability of at least two eggs?

probability_less_than_2 <- ppois(1, 3.8)
probability_at_least_2 <- 1 - probability_less_than_2
print(probability_at_least_2)

## [1] 0.8926203

Q1b. What is the probability of exactly 7 eggs?

probability_exactly_7 <- dpois(7, 2.8)
print(probability_exactly_7)

## [1] 0.01627988

Q2a. If you flip a fair coin ten times, what is the 50% most probable range (i.e., the inter-quartile range) of number of heads you could expect? (Hint: the binomial distribution is a natural one for coin flips. It has two parameters: “size” – the number of coin flips, and “prob” – the probability of heads.)

q_25 <- qbinom(0.25, size = 10, prob = 0.5)
q_75 <- qbinom(0.75, size = 10, prob = 0.5)
c(q_25, q_75)

## [1] 4 6

Q2b. Using the coins provided, flip a coin ten times and record numbers of heads and tails. How many heads did you get? Does your observation fall within the interquartile range?

I got 4 heads. My observation did fall within the interquartile range (0.25 - 0.75)

Q3a. Use rnorm() to draw 10 random values from a normal distribution with any mean and variance you wish. Call this vector of values x1. Then draw another 10 values from the same normal distribution. Call this x2. Calculate the correlation coefficient between x1 and x2. Note that the “true” correlation is zero (because they were sampled completely independently) but the sample correlation will be different from zero because of sampling error.

x1 <- rnorm(10, mean = 5, sd = 2)
x2 <- rnorm(10, mean = 5, sd = 2)

print(x1)

##  [1] 7.751482 5.893909 5.989102 6.429633 7.515402 5.143992 5.295938 6.288658
##  [9] 4.455625 4.129782

print(x2)

##  [1] 6.8907892 1.3161299 3.7022136 5.6104868 0.2073756 8.4008118 5.0002904
##  [8] 5.7509107 6.2652089 6.8779895

cor(x1, x2)

## [1] -0.4176065

Q3b. Repeat Q3a 100 times using a “for-loop”. (An example is provided below to help get you started.) Plot a histogram of these 100 correlation coefficients. What is the strongest correlation (in absolute value) you could observe simply by chance?

empty_vector <- vector("numeric",length = 100)

for(i in 1:100){
  x1 <- rnorm(10, mean = 5, sd = 2)
  x2 <- rnorm(10, mean = 5, sd = 2)
  empty_vector[i] <- cor(x1,x2)
}

hist(empty_vector, 
     main = "Histogram of Correlation Coefficient", 
     xlab = "correlation coefficient",
     col = "lightblue",
     border = "black",
     breaks = 100)

strongest_correlation <- max(abs(empty_vector))

print(strongest_correlation)

## [1] 0.887325

Q3c. Repeat Q3b but this time draw 1000 random values for x1 and x2 instead of 10. How does this change the distribution of correlation coefficients that you could observe simply by chance?

output <- vector("numeric", length = 100)

for(i in 1:100){
  x1 <- rnorm(1000, mean = 5, sd = 2)
  x2 <- rnorm(1000, mean = 5, sd = 2)
  output[i] <- cor(x1,x2)
}

hist(output, 
     main = "Histogram of Correlation Coefficient", 
     xlab = "correlation coefficient",
     col = "lightblue",
     border = "black",
     breaks = 100)

strongest_correlation <- max(abs(output))

print(strongest_correlation)

## [1] 0.09073092

When increasing the sample size (from 10 to 1000 in this case), the distribution of correlation coefficients will become more concentrated around 0. This is because as the sample size increases, random sampling error decreases, and the sample correlation is more likely to reflect the true correlation – which is 0, because x1 and x2 are generated independently.

Week05 Lab

Sean Lim

2024-09-26