Problem 1 — Normal Distribution

Let \(X \sim \text{Normal}(\mu=10,\ \sigma=6)\).

mu <- 10
sigma <- 6

p1_1 <- pnorm(5, mean = mu, sd = sigma)
p1_2 <- pnorm(16, mean = mu, sd = sigma) - pnorm(4, mean = mu, sd = sigma)
p1_3 <- pnorm(8, mean = mu, sd = sigma)

p1_1
## [1] 0.2023284
p1_2
## [1] 0.6826895
p1_3
## [1] 0.3694413

Answers

Problem 2 — Normal Distribution (Z-Score Distance From Mean)

Let \(X \sim \text{Normal}(\mu=300,\ \sigma=50)\).

mu <- 300
sigma <- 50

p2_1 <- 2 * (1 - pnorm(mu + 1*sigma, mean = mu, sd = sigma))
p2_2 <- 2 * (1 - pnorm(mu + 2*sigma, mean = mu, sd = sigma))
p2_3 <- 2 * (1 - pnorm(mu + 3*sigma, mean = mu, sd = sigma))

p2_1
## [1] 0.3173105
p2_2
## [1] 0.04550026
p2_3
## [1] 0.002699796

Answers

Problem 3 — Classification Metrics / ROC

Assume

and the rule: test positive if \(X > x^*\).

(1) Plot PDF and CDF of \(X \mid (D=1)\) for \(x=20,\dots,120\)

mu1 <- 70
sd1 <- 15

x <- 20:120
pdf <- dnorm(x, mean = mu1, sd = sd1)
cdf <- pnorm(x, mean = mu1, sd = sd1)

plot(x, pdf, type = "l", xlab = "x", ylab = "Density",
     main = "PDF of X | (D = 1) ~ Normal(70, 15)")

plot(x, cdf, type = "l", xlab = "x", ylab = "CDF",
     main = "CDF of X | (D = 1) ~ Normal(70, 15)")

(2) ROC curve for cutoffs \(52 < x^* \le 65\)

For a cutoff \(c\):

  • TPR (Sensitivity) \(= P(T=1 \mid D=1) = P(X>c \mid D=1)\)
  • FPR \(= P(T=1 \mid D=0) = P(X>c \mid D=0)\)

So \[ \text{TPR}(c)=1-\left(\frac{c-70}{15}\right),\quad \text{FPR}(c)=1-\left(\frac{c-50}{10}\right). \]

mu0 <- 50
sd0 <- 10

cutoff <- seq(52.1, 65.0, by = 0.1)

tpr <- 1 - pnorm(cutoff, mean = mu1, sd = sd1)
fpr <- 1 - pnorm(cutoff, mean = mu0, sd = sd0)

plot(fpr, tpr, type = "l",
     xlab = "False Positive Rate (FPR)",
     ylab = "True Positive Rate (TPR)",
     main = "ROC Curve (52 < x* <= 65)")
abline(0, 1, lty = 2)

(3) Equal cost: FPR and FNR equally bad

False Negative Rate (FNR) at cutoff \(c\): \[ \text{FNR}(c) = P(T=0 \mid D=1) = P(X \le c \mid D=1). \]

We pick \(x^*\) so that FPR and FNR are as equal as possible on the given grid (step 0.1), i.e. minimize \(|\text{FPR}(c)-\text{FNR}(c)|\).

fnr <- pnorm(cutoff, mean = mu1, sd = sd1)
diff <- abs(fpr - fnr)

i_best <- which.min(diff)

best_cutoff <- cutoff[i_best]
best_fpr <- fpr[i_best]
best_fnr <- fnr[i_best]

best_cutoff
## [1] 58
best_fpr
## [1] 0.2118554
best_fnr
## [1] 0.2118554
diff[i_best]
## [1] 2.775558e-17

Answer

  • \(x^* \approx 58\)

Outside Resource

R fuctions:

https://cran.r-project.org/doc/manuals/R-intro.pdf

Latex format:

https://www.cmor-faculty.rice.edu/~heinken/latex/symbols.pdf