library(tidyverse)
library(kableExtra)
In a certain region, times (minutes) between occurrences of earthquakes (of any magnitude) have a distribution with percentiles displayed in the table below.
# Data and labels
data <- c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10)
labels <- c("12.6", "26.8", "42.8", "61.3", "83.2", "110.0", "144.5", "193.1", "276.3", "?")
# Create pie chart
pie(data, labels = labels, main = "Pie Chart")
# Add legend
legend("topright", legend = labels, title = "Values", fill = rainbow(length(labels)))
# Optional: Add percentages to the chart
percentages <- round((data / sum(data)) * 100, 1)
percentage_labels <- paste(labels, "\n", percentages, "%", sep = "")
text(0, 0, percentage_labels, cex = 0.8)
40% are between 26.8 and 110 minutes because 60% are below 110 and 20% are below 26.8.
F(42.8) = P(X<=42.8) = 0.3. 42.8 is 30th percentile.
Q(0.9) = 276.3. Again, 273.3 is the 90th percentile.
Sketch (by hand) a histogram of this distribution.
| Percentile | Value (minutes) |
|---|---|
| 10th | 12.6 |
| 20th | 26.8 |
| 30th | 42.8 |
| 40th | 61.3 |
| 50th | 83.2 |
| 60th | 110.0 |
| 70th | 144.5 |
| 80th | 193.1 |
| 90th | 276.3 |
(Continued from previous HW.) In the Regina/Cady problem, let \(W=|R-Y|\) be the amount of time the first person to arrive has to wait for the second person. Recall that \(W\) is a continuous random variable with pdf \[ f_W(w) = 2(1-w), \quad 0 < w < 1. \]
From last HW part 3.3 “P(W<0.25)=$_{0}^{0.25}$2(1-w)dw
If you add the area of the square and triangle you get 0.4375. If they were to meet over lots of days, the first person will have to wait less than 15 minutes on 43.75% of days.”
Fw(w)=$_{0}{w}$2(1-u)du=1-(1-w)2
Q(0.25)=P(W<=w)=Fw(w)=1-square route(1-0.25)=0.1339. Waiting time will be 0.1339 or before on 25% of days.
Qw(p)=1-Sq Route(1-p), 0<=p<=1
Sketch a spinner corresponding to the distribution of \(W\). Label the 25th, 50th, and 75th percentiles.
Qw(0.25)=0.134 Qw(0.5)=1-Square Route(1-0.5)=0.293 Qw(0.75)=1-Square Route(1-0.75)=0.5
N_rep = 10000
x = runif(N_rep, 0, 1)
y = runif(N_rep, 0, 1)
w = abs(x - y)
quantile(w, c(0.25, 0.5, 0.75))
## 25% 50% 75%
## 0.1325632 0.2904813 0.4985520
Daily high temperatures (degrees Fahrenheit) in San Luis Obispo in August follow (approximately) A Normal distribution with a mean of 76.9 degrees F. The temperature exceeds 100 degrees Fahrenheit on about 1.5% of August days.
qnorm(0.985)
## [1] 2.17009
SD=(100-76.9)/2.17=10.65 degrees F is the standard Deviation
(100-78.9)/10.65=1.98 standard deviations about the mean
pnorm(1.98, mean = 0, sd = 1)
## [1] 0.9761482
1-0.976=0.024. 2.4% of days will be over 100 degrees F in August
0.024/0.015=1.6 A 2 percent increase in mean increases the probability of days above 100 degrees by 60%.
In a large class, scores on midterm 1 follow (approximately) a Normal distribution and scores on midterm 2 follow (approximately) a Normal distribution. Note that the SD is the same on both exams. The 40th percentile of midterm 1 scores is equal to the 70th percentile of midterm 2 scores. Compute (M1-M2)/SD
qnorm(0.7)
## [1] 0.5244005
70th percentile is 0.524 below the mean
qnorm(0.4)
## [1] -0.2533471
40th percentile is 0.253 below the mean
M1-0.253(SD)=M2+0.524(SD) (M1-M2)/SD=0.524+0.253=0.78 The mean for Midterm 1 is 0.78 Standard Deviation higher than the mean for midterm 2.
The latest series of collectible Lego Minifigures contains 3 different Minifigure prizes (labeled 1, 2, 3). Each package contains a single unknown prize. Suppose we only buy 3 packages and we consider as our sample space outcome the results of just these 3 packages (prize in package 1, prize in package 2, prize in package 3). For example, 323 (or (3, 2, 3)) represents prize 3 in the first package, prize 2 in the second package, prize 3 in the third package. Let \(X\) be the number of distinct prizes obtained in these 3 packages. Let \(Y\) be the number of these 3 packages that contain prize 1. Suppose that each package is equally likely to contain any of the 3 prizes, regardless of the contents of other packages. There are 27 possible, equally likely outcomes
| box1 | box2 | box3 | X | Y |
|---|---|---|---|---|
| 1 | 1 | 1 | ||
| 2 | 1 | 1 | ||
| 3 | 1 | 1 | ||
| 1 | 2 | 1 | ||
| 2 | 2 | 1 | ||
| 3 | 2 | 1 | ||
| 1 | 3 | 1 | ||
| 2 | 3 | 1 | ||
| 3 | 3 | 1 | ||
| 1 | 1 | 2 | ||
| 2 | 1 | 2 | ||
| 3 | 1 | 2 | ||
| 1 | 2 | 2 | ||
| 2 | 2 | 2 | ||
| 3 | 2 | 2 | ||
| 1 | 3 | 2 | ||
| 2 | 3 | 2 | ||
| 3 | 3 | 2 | ||
| 1 | 1 | 3 | ||
| 2 | 1 | 3 | ||
| 3 | 1 | 3 | ||
| 1 | 2 | 3 | ||
| 2 | 2 | 3 | ||
| 3 | 2 | 3 | ||
| 1 | 3 | 3 | ||
| 2 | 3 | 3 | ||
| 3 | 3 | 3 |
See table
Construct a two-way table representing the joint distribution of \(X\) and \(Y\).
| y | |||||
|---|---|---|---|---|---|
| x | 0 | 1 | 2 | 3 | Tot |
| 1 | 2/27 | 0 | 0 | 1/27 | 3/27 |
| 2 | 6/27 | 6/27 | 6/27 | 0 | 18/27 |
| 3 | 0 | 6/27 | 0 | 0 | 6/27 |
| Tot | 8/27 | 12/27 | 6/27 | 1/27 | 1 |
Sketch a plot representing the joint distribution of X and Y.
Construct a spinner corresponding to the joint distribution of \(X\) and \(Y\).
# Labels and values
labels <- c("1, 0", "1, 3", "2, 0", "2, 1", "2, 2", "3, 1")
values <- c(2/27, 1/27, 6/27, 6/27, 6/27, 6/27)
# Create a pie chart
pie(values, labels = labels, main = "Pie Chart")
# Adding a legend
legend("topright", legend = labels, fill = rainbow(length(labels)), cex = 0.8)
Label 3 marbles, 1, 2, and 3. Put them in a hat and pick one out, record what you got, and then put it back. Repeat 3 times. Measure X and Y for outcome.
or
Simulate using the spinner to get a (X,Y) pair.
Identify the marginal distribution of X, and construct a corresponding spinner.
| x | P(X=x) |
|---|---|
| 1 | 3/27 |
| 2 | 18/27 |
| 3 | 6/27 |
Identify the marginal distribution of Y, and construct a corresponding spinner.
| y | P(y=Y) |
|---|---|
| 0 | 8/27 |
| 1 | 12/27 |
| 2 | 6/27 |
| 3 | 1/27 |
Describe in detail in words how you could conduct a simulation and use the results to approximate. (This part is asking you to describe the process in words; not to write code.)
Code and run the simulation from the previous part and use the simulation results to approximate each of
# Total number of possible outcomes
total_outcomes <- 27
# Number of outcomes where all 3 prizes are distinct
desired_outcomes <- 3
# Probability of X = 3
prob_X_3 <- desired_outcomes / total_outcomes
# Print the result
print(prob_X_3)
## [1] 0.1111111
P(X=3)=0.111
total_outcomes <- 27 # Total number of equally likely outcomes
desired_outcomes <- 0 # Number of outcomes meeting the condition
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
if (length(unique(c(i, j, k))) == 2 && sum(c(i, j, k) == 1) == 1) {
desired_outcomes <- desired_outcomes + 1
}
}
}
}
probability <- desired_outcomes / total_outcomes
probability
## [1] 0.2222222
p(X=2, Y=1)=0.222
# List of all possible outcomes
outcomes <- expand.grid(1:3, 1:3, 1:3)
# Calculate X for each outcome
X_values <- apply(outcomes, 1, function(row) length(unique(row)))
# Calculate the expected value of X
expected_X <- mean(X_values)
expected_X
## [1] 2.111111
E(X)=2.11
# Total number of possible outcomes
total_outcomes <- 27
# Initialize the expected value of Y
expected_Y <- 0
# Iterate through the possible values of Y and calculate the expected value
for (y in 0:3) {
# Probability of Y being y
prob_Y <- choose(3, y) * (1/3)^y * (2/3)^(3-y)
# Update the expected value by adding the product of Y and its probability
expected_Y <- expected_Y + y * prob_Y
}
# Print the expected value of Y
print(expected_Y)
## [1] 1
E(Y)=1
# Create a matrix to represent all possible outcomes
outcomes <- expand.grid(1:3, 1:3, 1:3)
# Calculate the values of X and Y for each outcome
X <- apply(outcomes, 1, function(row) length(unique(row)))
Y <- apply(outcomes, 1, function(row) sum(row == 1))
# Calculate E(XY)
EXY <- sum(X * Y) / length(outcomes)
EXY
## [1] 19
E(XY)=19
# Define the probability distribution of X
prob_X <- c(1/27, 6/27, 20/27) # Probabilities for X = 1, 2, 3
# Values of X
values_X <- 1:3
# Mean of X
mean_X <- sum(values_X * prob_X)
# Calculate SD(X)
sd_X <- sqrt(sum(prob_X * (values_X - mean_X)^2))
sd_X
## [1] 0.5315815
SD(X)=0.531
n <- 3
p <- 1/3
sd_Y <- sqrt(n * p * (1 - p))
sd_Y
## [1] 0.8164966
SD(Y)=0.816