GSB518 HW4

library(tidyverse)
library(kableExtra)

Problem 1

In a certain region, times (minutes) between occurrences of earthquakes (of any magnitude) have a distribution with percentiles displayed in the table below.

Construct a spinner corresponding to this distribution.

# Data and labels
data <- c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10)
labels <- c("12.6", "26.8", "42.8", "61.3", "83.2", "110.0", "144.5", "193.1", "276.3", "?")

# Create pie chart
pie(data, labels = labels, main = "Pie Chart")

# Add legend
legend("topright", legend = labels, title = "Values", fill = rainbow(length(labels)))

# Optional: Add percentages to the chart
percentages <- round((data / sum(data)) * 100, 1)
percentage_labels <- paste(labels, "\n", percentages, "%", sep = "")
text(0, 0, percentage_labels, cex = 0.8)

What percent of times are between 26.8 and 110.0 minutes?

40% are between 26.8 and 110 minutes because 60% are below 110 and 20% are below 26.8.

Let $F$ be the cdf. Evaluate and interpret $F(42.8)$.

F(42.8) = P(X<=42.8) = 0.3. 42.8 is 30th percentile.

Let $Q$ be the quantile function. Evaluate and interpret $Q(0.9)$.

Q(0.9) = 276.3. Again, 273.3 is the 90th percentile.

Sketch (by hand) a histogram of this distribution.

Percentile	Value (minutes)
10th	12.6
20th	26.8
30th	42.8
40th	61.3
50th	83.2
60th	110.0
70th	144.5
80th	193.1
90th	276.3

Problem 2

(Continued from previous HW.) In the Regina/Cady problem, let $W=|R-Y|$ be the amount of time the first person to arrive has to wait for the second person. Recall that $W$ is a continuous random variable with pdf \[ f_W(w) = 2(1-w), \quad 0 < w < 1. \]

Let $F$ be the cdf. Evaluate and interpret $F(0.25)$.

From last HW part 3.3 “P(W<0.25)=$_{0}^{0.25}$2(1-w)dw

If you add the area of the square and triangle you get 0.4375. If they were to meet over lots of days, the first person will have to wait less than 15 minutes on 43.75% of days.”

Find an expression for the cdf of $W$. Set up an integral, but sketch a picture and use geometry to compute.

Fw(w)=$_{0}^{{w}$2(1-u)du=1-(1-w)}2

Find and interpret the 25th percentile of $W$. (You can do the next part first if you want, but it might help to start with a specific number like in this part.)

Q(0.25)=P(W<=w)=Fw(w)=1-square route(1-0.25)=0.1339. Waiting time will be 0.1339 or before on 25% of days.

Find the quantile function of $W$.

Qw(p)=1-Sq Route(1-p), 0<=p<=1

Sketch a spinner corresponding to the distribution of $W$. Label the 25th, 50th, and 75th percentiles.

Qw(0.25)=0.134 Qw(0.5)=1-Square Route(1-0.5)=0.293 Qw(0.75)=1-Square Route(1-0.75)=0.5

Coding required. Use your simulation results from before to approximate the 25th, 50th, and 75th percentiles.

N_rep = 10000

x = runif(N_rep, 0, 1)
y = runif(N_rep, 0, 1)

w = abs(x - y)

quantile(w, c(0.25, 0.5, 0.75))

##       25%       50%       75% 
## 0.1325632 0.2904813 0.4985520

Problem 3

Daily high temperatures (degrees Fahrenheit) in San Luis Obispo in August follow (approximately) A Normal distribution with a mean of 76.9 degrees F. The temperature exceeds 100 degrees Fahrenheit on about 1.5% of August days.

What is the standard deviation?

qnorm(0.985)

## [1] 2.17009

SD=(100-76.9)/2.17=10.65 degrees F is the standard Deviation

Suppose the mean increases by 2 degrees Fahrenheit. On what percentage of August days will the daily high temperature exceed 100 degrees Fahrenheit? (Assume the standard deviation does not change.)

(100-78.9)/10.65=1.98 standard deviations about the mean

pnorm(1.98, mean = 0, sd = 1)

## [1] 0.9761482

1-0.976=0.024. 2.4% of days will be over 100 degrees F in August

A mean of 78.9 is 1.02 times greater than a mean of 76.9. By what (multiplicative) factor has the percentage of 100-degree days increased? What do you notice?

0.024/0.015=1.6 A 2 percent increase in mean increases the probability of days above 100 degrees by 60%.

Problem 4

In a large class, scores on midterm 1 follow (approximately) a Normal distribution and scores on midterm 2 follow (approximately) a Normal distribution. Note that the SD is the same on both exams. The 40th percentile of midterm 1 scores is equal to the 70th percentile of midterm 2 scores. Compute (M1-M2)/SD

qnorm(0.7)

## [1] 0.5244005

70th percentile is 0.524 below the mean

qnorm(0.4)

## [1] -0.2533471

40th percentile is 0.253 below the mean

M1-0.253(SD)=M2+0.524(SD) (M1-M2)/SD=0.524+0.253=0.78 The mean for Midterm 1 is 0.78 Standard Deviation higher than the mean for midterm 2.

Problem 5

The latest series of collectible Lego Minifigures contains 3 different Minifigure prizes (labeled 1, 2, 3). Each package contains a single unknown prize. Suppose we only buy 3 packages and we consider as our sample space outcome the results of just these 3 packages (prize in package 1, prize in package 2, prize in package 3). For example, 323 (or (3, 2, 3)) represents prize 3 in the first package, prize 2 in the second package, prize 3 in the third package. Let $X$ be the number of distinct prizes obtained in these 3 packages. Let $Y$ be the number of these 3 packages that contain prize 1. Suppose that each package is equally likely to contain any of the 3 prizes, regardless of the contents of other packages. There are 27 possible, equally likely outcomes

box1	box2	box3
1	1	1
2	1	1
3	1	1
1	2	1
2	2	1
3	2	1
1	3	1
2	3	1
3	3	1
1	1	2
2	1	2
3	1	2
1	2	2
2	2	2
3	2	2
1	3	2
2	3	2
3	3	2
1	1	3
2	1	3
3	1	3
1	2	3
2	2	3
3	2	3
1	3	3
2	3	3
3	3	3

Evaluate $X$ and $Y$ for each of the outcomes

See table

Construct a two-way table representing the joint distribution of $X$ and $Y$.

y

x 0 1 2 3 Tot

1 2/27 0 0 1/27 3/27

2 6/27 6/27 6/27 0 18/27

3 0 6/27 0 0 6/27

Tot 8/27 12/27 6/27 1/27 1
Sketch a plot representing the joint distribution of X and Y.
Construct a spinner corresponding to the joint distribution of $X$ and $Y$.

	y
x	0	1	2	3	Tot
1	2/27	0	0	1/27	3/27
2	6/27	6/27	6/27	0	18/27
3	0	6/27	0	0	6/27
Tot	8/27	12/27	6/27	1/27	1

# Labels and values
labels <- c("1, 0", "1, 3", "2, 0", "2, 1", "2, 2", "3, 1")
values <- c(2/27, 1/27, 6/27, 6/27, 6/27, 6/27)

# Create a pie chart
pie(values, labels = labels, main = "Pie Chart")

# Adding a legend
legend("topright", legend = labels, fill = rainbow(length(labels)), cex = 0.8)

Describe two ways to simulate an (X,Y) pair.

Label 3 marbles, 1, 2, and 3. Put them in a hat and pick one out, record what you got, and then put it back. Repeat 3 times. Measure X and Y for outcome.

Simulate using the spinner to get a (X,Y) pair.

Identify the marginal distribution of X, and construct a corresponding spinner.

x P(X=x)

1 3/27

2 18/27

3 6/27
Identify the marginal distribution of Y, and construct a corresponding spinner.

y P(y=Y)

0 8/27

1 12/27

2 6/27

3 1/27
Describe in detail in words how you could conduct a simulation and use the results to approximate. (This part is asking you to describe the process in words; not to write code.)
1. to find P(X=3) use this simulation many times, find the number of simulated values that equal 3, divide that by the total number of simulated values.
2. to find P(X=2, Y=1) use this simulation many times, find the number of simulated pairs where X=2 and Y=1, divide that by the total number of simulated values.
3. to find E(X) use this simulation many times, compute the average of X.
4. to find E(Y) use this simulation many times, compute the average of Y.
5. to find E(XY) use this simulation many times, find the product XY for every pair simulated, then find the average value of the products.
6. to find SD(X) use this simulation many times, compute the Standard Deviation of X.
7. to find SD(Y) use this simulation many times, compute the Standard Deviation of Y.
8. to find Cov(X, Y) use what you got in previous parts and plug into the equation Cov(X, Y)=E(XY)-E(X)E(Y)
9. to find Corr(X, Y) use what you got in previous parts and plug into the equation Corr(X,Y)=Cov(X,Y)/(SD(X)SD(Y))

x	P(X=x)
1	3/27
2	18/27
3	6/27

y	P(y=Y)
0	8/27
1	12/27
2	6/27
3	1/27

Code and run the simulation from the previous part and use the simulation results to approximate each of

# Total number of possible outcomes
total_outcomes <- 27

# Number of outcomes where all 3 prizes are distinct
desired_outcomes <- 3

# Probability of X = 3
prob_X_3 <- desired_outcomes / total_outcomes

# Print the result
print(prob_X_3)

## [1] 0.1111111

P(X=3)=0.111

total_outcomes <- 27  # Total number of equally likely outcomes
desired_outcomes <- 0  # Number of outcomes meeting the condition

for (i in 1:3) {
  for (j in 1:3) {
    for (k in 1:3) {
      if (length(unique(c(i, j, k))) == 2 && sum(c(i, j, k) == 1) == 1) {
        desired_outcomes <- desired_outcomes + 1
      }
    }
  }
}

probability <- desired_outcomes / total_outcomes
probability

## [1] 0.2222222

p(X=2, Y=1)=0.222

# List of all possible outcomes
outcomes <- expand.grid(1:3, 1:3, 1:3)

# Calculate X for each outcome
X_values <- apply(outcomes, 1, function(row) length(unique(row)))

# Calculate the expected value of X
expected_X <- mean(X_values)

expected_X

## [1] 2.111111

E(X)=2.11

# Total number of possible outcomes
total_outcomes <- 27

# Initialize the expected value of Y
expected_Y <- 0

# Iterate through the possible values of Y and calculate the expected value
for (y in 0:3) {
  # Probability of Y being y
  prob_Y <- choose(3, y) * (1/3)^y * (2/3)^(3-y)
  
  # Update the expected value by adding the product of Y and its probability
  expected_Y <- expected_Y + y * prob_Y
}

# Print the expected value of Y
print(expected_Y)

## [1] 1

E(Y)=1

# Create a matrix to represent all possible outcomes
outcomes <- expand.grid(1:3, 1:3, 1:3)

# Calculate the values of X and Y for each outcome
X <- apply(outcomes, 1, function(row) length(unique(row)))
Y <- apply(outcomes, 1, function(row) sum(row == 1))

# Calculate E(XY)
EXY <- sum(X * Y) / length(outcomes)

EXY

## [1] 19

E(XY)=19

# Define the probability distribution of X
prob_X <- c(1/27, 6/27, 20/27) # Probabilities for X = 1, 2, 3

# Values of X
values_X <- 1:3

# Mean of X
mean_X <- sum(values_X * prob_X)

# Calculate SD(X)
sd_X <- sqrt(sum(prob_X * (values_X - mean_X)^2))

sd_X

## [1] 0.5315815

SD(X)=0.531

n <- 3
p <- 1/3
sd_Y <- sqrt(n * p * (1 - p))
sd_Y

## [1] 0.8164966

SD(Y)=0.816

Cov(X,Y)=E(XY)-E(X)E(Y)=16.89

Corr(X,Y)=Cov(X,Y)/(SD(X)SD(Y)) =16.89/(0.816)(0.531)=38.98

Refer to above