Question 1

p <- 0.4               # success prob (gets a candy)
q <- 1 - p

## (a) Probability of 8 failures before 5 successes
r_a <- 5               # number of successes
k_a <- 8               # number of failures
# In R: dnbinom(x, size=r, prob=p) = P(#failures = x before r-th success)
prob_a <- dnbinom(k_a, size = r_a, prob = p)

prob_a
## [1] 0.08513638
## (b) Probability that we need at least 10 doors to get 4 candies
r_b <- 4

## Method 1 (Binomial tail equivalence):
## "At least 10 trials" <=> at most 3 successes in first 9 trials
prob_b_binom <- pbinom(3, size = 9, prob = p)

## Method 2 (Negative Binomial):
## Y = trials to get 4 successes = failures + 4
## Y >= 10  <=> failures >= 6  <=> 1 - P(failures <= 5)
prob_b_negbin <- 1 - pnbinom(5, size = r_b, prob = p)

prob_b_binom
## [1] 0.4826097
prob_b_negbin
## [1] 0.4826097

Question 2

# Rate per week
rate <- 0.2

## (a) 3 weeks: P(X = 1)
lambda_a <- rate * 3
prob_a <- dpois(1, lambda_a)

## (b) 5 weeks: P(X >= 2)
lambda_b <- rate * 5
prob_b <- ppois(1, lambda_b, lower.tail = FALSE)  # = 1 - P(X <= 1)

## (c) 15 weeks: P(X <= 1)
lambda_c <- rate * 15
prob_c <- ppois(1, lambda_c, lower.tail = TRUE)   # P(0)+P(1)

round(c(a = prob_a, b = prob_b, c = prob_c), 6)
##        a        b        c 
## 0.329287 0.264241 0.199148

Question 3

N <- 50      # total population
m <- 10      # tagged (successes) in population
k <- 20      # sample size
x <- 4       # observed tagged in second sample

# Hypergeometric probability P(X = 4)
p_x4 <- dhyper(x, m, N - m, k)
p_x4
## [1] 0.2800586

Question 4

A probability density function (pdf) is used for continuous random variables.
Two key rules define a valid pdf:

  1. It never goes below 0: \(f(x) \ge 0\).
  2. Its total area under the curve is 1: \(\int_{-\infty}^{\infty} f(x)\,dx = 1\).

The height \(f(x)\) at a single point is not a probability. Probabilities come from areas under the curve over intervals—like \(\Pr(a < X < b) = \int_a^b f(x)\,dx\). Because only area matters, a pdf can be tall and narrow and still have total area 1. That’s why it’s perfectly okay for a pdf to take values greater than 1 at some points.

example

For a Uniform distribution on \([a,b]\), the pdf is \[ f(x)=\frac{1}{b-a}\quad \text{for } a \le x \le b. \] If the interval is short, say \([0, 0.5]\), then \(f(x)=\frac{1}{0.5}=2>1\).
The area is still \(2 \times 0.5 = 1\), so it’s a valid pdf.

a <- 0; b <- 0.5
height <- 1/(b - a)        # pdf height for Uniform(a,b)
area   <- height * (b - a) # total area

c(pdf_height = height, total_area = area)
## pdf_height total_area 
##          2          1

Question 5

The standard normal distribution is perfectly symmetric around zero.
Its density function is \[ \phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}, \] and because \(\phi(-z) = \phi(z)\), the left and right tails of the distribution have the same probability.

Therefore,
\[ P(Z \le -t) = P(Z \ge t). \]

Now, the event \(|Z| \ge t\) means the random value of \(Z\) is either less than or equal to \(-t\) or greater than or equal to \(t\): \[ P(|Z| \ge t) = P(Z \ge t) + P(Z \le -t). \]

Since the two tails are equal,
\[ P(|Z| \ge t) = P(Z \ge t) + P(Z \ge t) = 2P(Z \ge t). \]

This relationship directly comes from the symmetry property of the normal distribution [Continuous Distributions (Standard Normal section)].

t <- 1.5
lhs <- pnorm(-t) + (1 - pnorm(t))   # P(|Z| >= t)
rhs <- 2 * (1 - pnorm(t))           # 2 * P(Z >= t)
c(lhs = lhs, rhs = rhs)
##       lhs       rhs 
## 0.1336144 0.1336144

Question 6

The standard normal distribution is symmetric around 0.
Because of that symmetry, any odd-powered function (like \(z^3\), \(z^5\)) is odd: it takes equal-magnitude but opposite signs at \(+z\) and \(-z\). When we average (integrate) an odd function over a symmetric distribution centered at 0, the positives and negatives cancel out. So the result is 0.

The third moment of a normal variable: \[ E[X^3] = 3\mu\sigma^2 + \mu^3. \] Setting \(\mu=0\) and \(\sigma^2=1\) for the standard normal gives \(E[Z^3]=0\). The same cancellation logic (symmetry/odd integrand) applies to \(E[Z^5]\) too, so it is also 0.

Answer:
\[ E[Z^3] = 0,\qquad E[Z^5] = 0. \]


Question 7

The Uniform distribution has pdf

\[ f(x) = \begin{cases} \dfrac{1}{b - a}, & a \le x \le b,\\ 0, & \text{otherwise.} \end{cases} \]

  1. Compute the mean

\[ E[X] = \int_a^b x\,f(x)\,dx = \frac{1}{b-a}\int_a^b x\,dx = \frac{a+b}{2}. \]

  1. Compute the second moment

\[ E[X^2] = \int_a^b x^2 f(x)\,dx = \frac{1}{b-a}\int_a^b x^2\,dx = \frac{a^2 + ab + b^2}{3}. \]

  1. Compute the variance

\[ \text{Var}(X) = E[X^2] - (E[X])^2 = \frac{(b - a)^2}{12}. \]


Question 8

The model:

\[ y = \beta_0 + \beta_1 x + \beta_2 (x - 1)_+ + \varepsilon, \] where: - \(\beta_0\) is the intercept, - \(\beta_1\) is the slope before \(x = 1\), - \(\beta_2\) represents the change in slope after \(x = 1\), - and \(\varepsilon\) is the random error.

Thus, the slope for \(x > 1\) becomes \(\beta_1 + \beta_2\).

Question 9

In probability terms, “being predictable” means that knowing one variable gives us information about another.
This happens when two variables are dependent.

If we have two variables \(X\) and \(Y\):

In contingency-table language (slide 31, from Notes 13), independence means all rows of conditional probabilities \(P(Y=j|X=i)\) are the same.
When those rows differ, \(Y\) is predictable from \(X\) — they are not independent.


Question 10

This overbooking problem, can be modeled using the Binomial distribution.

Let:

- \(p\) = probability that a person shows up (success),

- \(q = 1 - p\) = probability of a no-show,

- \(m\) = number of total reservations made,

- \(n\) = number of rooms available.

Then the number of guests who show up follows a Binomial distribution: \[ X \sim \text{Binomial}(m, p). \]

The goal is to choose \(m\) (the number of bookings to accept) so that: \[ P(X > n) \text{ is small, e.g., } 0.01 \text{ or } 0.05. \]

This means we allow only a small chance that more people show up than there are rooms available.
Equivalently, we find the smallest \(m\) such that: \[ P(X \le n) \ge 1 - \alpha, \] where \(\alpha\) is our acceptable overbooking risk.

n <- 100   # number of rooms
p <- 0.95  # probability a guest shows up
alpha <- 0.01  # acceptable risk (1%)

# Function to find minimal m with P(X <= n) >= 1 - alpha
find_m <- function(n, p, alpha) {
  for (m in n:200) {  # search a range
    if (pbinom(n, size = m, prob = p) >= 1 - alpha) return(m)
  }
}

m_opt <- find_m(n, p, alpha)
m_opt
## [1] 100
# The hotel should accept about m_opt reservations for n rooms