p <- 0.4 # success prob (gets a candy)
q <- 1 - p
## (a) Probability of 8 failures before 5 successes
r_a <- 5 # number of successes
k_a <- 8 # number of failures
# In R: dnbinom(x, size=r, prob=p) = P(#failures = x before r-th success)
prob_a <- dnbinom(k_a, size = r_a, prob = p)
prob_a
## [1] 0.08513638
## (b) Probability that we need at least 10 doors to get 4 candies
r_b <- 4
## Method 1 (Binomial tail equivalence):
## "At least 10 trials" <=> at most 3 successes in first 9 trials
prob_b_binom <- pbinom(3, size = 9, prob = p)
## Method 2 (Negative Binomial):
## Y = trials to get 4 successes = failures + 4
## Y >= 10 <=> failures >= 6 <=> 1 - P(failures <= 5)
prob_b_negbin <- 1 - pnbinom(5, size = r_b, prob = p)
prob_b_binom
## [1] 0.4826097
prob_b_negbin
## [1] 0.4826097
# Rate per week
rate <- 0.2
## (a) 3 weeks: P(X = 1)
lambda_a <- rate * 3
prob_a <- dpois(1, lambda_a)
## (b) 5 weeks: P(X >= 2)
lambda_b <- rate * 5
prob_b <- ppois(1, lambda_b, lower.tail = FALSE) # = 1 - P(X <= 1)
## (c) 15 weeks: P(X <= 1)
lambda_c <- rate * 15
prob_c <- ppois(1, lambda_c, lower.tail = TRUE) # P(0)+P(1)
round(c(a = prob_a, b = prob_b, c = prob_c), 6)
## a b c
## 0.329287 0.264241 0.199148
N <- 50 # total population
m <- 10 # tagged (successes) in population
k <- 20 # sample size
x <- 4 # observed tagged in second sample
# Hypergeometric probability P(X = 4)
p_x4 <- dhyper(x, m, N - m, k)
p_x4
## [1] 0.2800586
A probability density function (pdf) is used for continuous random
variables.
Two key rules define a valid pdf:
The height \(f(x)\) at a single point is not a probability. Probabilities come from areas under the curve over intervals—like \(\Pr(a < X < b) = \int_a^b f(x)\,dx\). Because only area matters, a pdf can be tall and narrow and still have total area 1. That’s why it’s perfectly okay for a pdf to take values greater than 1 at some points.
example
For a Uniform distribution on \([a,b]\), the pdf is \[
f(x)=\frac{1}{b-a}\quad \text{for } a \le x \le b.
\] If the interval is short, say \([0,
0.5]\), then \(f(x)=\frac{1}{0.5}=2>1\).
The area is still \(2 \times 0.5 = 1\),
so it’s a valid pdf.
a <- 0; b <- 0.5
height <- 1/(b - a) # pdf height for Uniform(a,b)
area <- height * (b - a) # total area
c(pdf_height = height, total_area = area)
## pdf_height total_area
## 2 1
The standard normal distribution is perfectly symmetric around
zero.
Its density function is \[
\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2},
\] and because \(\phi(-z) =
\phi(z)\), the left and right tails of the distribution have the
same probability.
Therefore,
\[
P(Z \le -t) = P(Z \ge t).
\]
Now, the event \(|Z| \ge t\) means the random value of \(Z\) is either less than or equal to \(-t\) or greater than or equal to \(t\): \[ P(|Z| \ge t) = P(Z \ge t) + P(Z \le -t). \]
Since the two tails are equal,
\[
P(|Z| \ge t) = P(Z \ge t) + P(Z \ge t) = 2P(Z \ge t).
\]
This relationship directly comes from the symmetry property of the normal distribution [Continuous Distributions (Standard Normal section)].
t <- 1.5
lhs <- pnorm(-t) + (1 - pnorm(t)) # P(|Z| >= t)
rhs <- 2 * (1 - pnorm(t)) # 2 * P(Z >= t)
c(lhs = lhs, rhs = rhs)
## lhs rhs
## 0.1336144 0.1336144
The standard normal distribution is symmetric around
0.
Because of that symmetry, any odd-powered function (like \(z^3\), \(z^5\)) is odd: it takes equal-magnitude but
opposite signs at \(+z\) and \(-z\). When we average (integrate) an odd
function over a symmetric distribution centered at 0, the positives and
negatives cancel out. So the result is 0.
The third moment of a normal variable: \[
E[X^3] = 3\mu\sigma^2 + \mu^3.
\] Setting \(\mu=0\) and \(\sigma^2=1\) for the standard normal gives
\(E[Z^3]=0\). The same cancellation
logic (symmetry/odd integrand) applies to \(E[Z^5]\) too, so it is also 0.
Answer:
\[
E[Z^3] = 0,\qquad E[Z^5] = 0.
\]
The Uniform distribution has pdf
\[ f(x) = \begin{cases} \dfrac{1}{b - a}, & a \le x \le b,\\ 0, & \text{otherwise.} \end{cases} \]
\[ E[X] = \int_a^b x\,f(x)\,dx = \frac{1}{b-a}\int_a^b x\,dx = \frac{a+b}{2}. \]
\[ E[X^2] = \int_a^b x^2 f(x)\,dx = \frac{1}{b-a}\int_a^b x^2\,dx = \frac{a^2 + ab + b^2}{3}. \]
\[ \text{Var}(X) = E[X^2] - (E[X])^2 = \frac{(b - a)^2}{12}. \]
The model:
\[ y = \beta_0 + \beta_1 x + \beta_2 (x - 1)_+ + \varepsilon, \] where: - \(\beta_0\) is the intercept, - \(\beta_1\) is the slope before \(x = 1\), - \(\beta_2\) represents the change in slope after \(x = 1\), - and \(\varepsilon\) is the random error.
Thus, the slope for \(x > 1\) becomes \(\beta_1 + \beta_2\).
In probability terms, “being predictable” means that knowing one
variable gives us information about another.
This happens when two variables are dependent.
If we have two variables \(X\) and \(Y\):
They are independent if
\[
P(Y = j \mid X = i) = P(Y = j)
\quad \text{for all } i.
\] This means knowing \(X\)
doesn’t change the probability of \(Y\).
They are dependent if
\[
P(Y = j \mid X = i) \neq P(Y = j)
\quad \text{for some } i.
\] In this case, \(X\) provides
information that helps us predict \(Y\)
—
just like saying “you’re predictable” after learning something about
\(X\).
In contingency-table language (slide 31, from Notes 13), independence
means all rows of conditional probabilities \(P(Y=j|X=i)\) are the same.
When those rows differ, \(Y\) is
predictable from \(X\) — they are not
independent.
This overbooking problem, can be modeled using the Binomial distribution.
Let:
- \(p\) = probability that a person shows up (success),
- \(q = 1 - p\) = probability of a no-show,
- \(m\) = number of total reservations made,
- \(n\) = number of rooms available.
Then the number of guests who show up follows a Binomial distribution: \[ X \sim \text{Binomial}(m, p). \]
The goal is to choose \(m\) (the number of bookings to accept) so that: \[ P(X > n) \text{ is small, e.g., } 0.01 \text{ or } 0.05. \]
This means we allow only a small chance that more people show up than
there are rooms available.
Equivalently, we find the smallest \(m\) such that: \[
P(X \le n) \ge 1 - \alpha,
\] where \(\alpha\) is our
acceptable overbooking risk.
n <- 100 # number of rooms
p <- 0.95 # probability a guest shows up
alpha <- 0.01 # acceptable risk (1%)
# Function to find minimal m with P(X <= n) >= 1 - alpha
find_m <- function(n, p, alpha) {
for (m in n:200) { # search a range
if (pbinom(n, size = m, prob = p) >= 1 - alpha) return(m)
}
}
m_opt <- find_m(n, p, alpha)
m_opt
## [1] 100
# The hotel should accept about m_opt reservations for n rooms