Question 1

Consider the Binomial distribution with \(n = 24\) and \(p = .9\) that is used to model the number of correctly received bits on a satellite link that transmits data in \(24\)-bit blocks.

Quesion 1 (a) Plot the probability density function and cumulative distribution function for the number of correctly received bits

#Answer 1a
# Parameters
n <- 24
p <- 0.9

# Possible number of successes (from 0 to 24)
k <- 0:24

# PDF
pdf <- dbinom(k, n, p)

# CDF
cdf <- pbinom(k, n, p)

# Plotting the PDF
par(mfrow=c(1,2)) # This makes the next two plots appear side-by-side

plot(k, pdf, type="h", 
     main="Probability Density Function (PDF)", 
     xlab="Number of correctly received bits", 
     ylab="Probability", 
     ylim=c(0, max(pdf)),
     col="blue", lwd=2)

# Plotting the CDF
plot(k, cdf, type="s", 
     main="Cumulative Distribution Function (CDF)", 
     xlab="Number of correctly received bits", 
     ylab="Probability", 
     ylim=c(0,1),
     col="red", lwd=2)

Quesion 1 (b) What is the mean number of correctly received bits? What is the standard deviation?

#Answer 1b
#(make sure that what you print make sense to me) - for example to print mean you can use cat command once you done with the calculation and have value in the variable
#variable_name=10 
#cat("Mean number of correctly received bits ",  variable_name)

# Parameters
n <- 24
p <- 0.9

# Calculate mean and standard deviation
mean_bits <- n * p
std_dev_bits <- sqrt(n * p * (1-p))

# Print the results
cat("Mean number of correctly received bits: ", mean_bits, "\n")

## Mean number of correctly received bits:  21.6

cat("Standard deviation of correctly received bits: ", std_dev_bits, "\n")

## Standard deviation of correctly received bits:  1.469694

Quesion 1 (c) What is the probability of more than \(3\)-bit errors in the block of \(24\)?

#Answer 1c
# Parameters
n <- 24
p <- 0.9
q <- 1 - p

# Probability of 3 or fewer bit errors
prob_3_or_fewer_errors <- pbinom(3, n, q)

# Probability of more than 3 bit errors
prob_more_than_3_errors <- 1 - prob_3_or_fewer_errors

# Print the result
cat("Probability of more than 3-bit errors in the block of 24: ", prob_more_than_3_errors, "\n")

## Probability of more than 3-bit errors in the block of 24:  0.2142622

Quesion 1 (d) What is the median of this distribution? What can you interpret from the median value?

#Answer 1d
# Parameters
n <- 24
p <- 0.9

# Find the median
median_val <- qbinom(0.5, n, p)

# Print the result
cat("The median of the distribution is: ", median_val, "\n")

## The median of the distribution is:  22

If the median value is close to 24 (the maximum), the majority (more than half) of the 24-bit blocks will have the median value or fewer bits correctly received. Since p is relatively high (0.9), we expect the median to be close to 24.

If the median is 22, it means that more than half of the 24-bit blocks will have 22 or fewer bits correctly received, and the other half will have more than 22 bits correctly received. Given our high probability of success p, this would suggest that most of the time, almost all bits are correctly received, but there is still a substantial chance of getting a few errors.

Quesion 1 (e) What is the 60th quantile of this distribution? What can you interpret from this value?

#Answer 1e
# Parameters
n <- 24
p <- 0.9

# Find the 60th quantile
quantile_60 <- qbinom(0.6, n, p)

# Print the result
cat("The 60th quantile of the distribution is: ", quantile_60, "\n")

## The 60th quantile of the distribution is:  22

#Interpretation
cat("Interpretation: ", "The 60th quantile value of ", quantile_60, " means that 60% of the 24-bit blocks will have ", quantile_60, " or fewer bits correctly received. Given the high probability of success (", p, "), this value suggests that the majority of the transmissions are quite reliable, with only a few bits being received incorrectly in most blocks.", "\n")

## Interpretation:  The 60th quantile value of  22  means that 60% of the 24-bit blocks will have  22  or fewer bits correctly received. Given the high probability of success ( 0.9 ), this value suggests that the majority of the transmissions are quite reliable, with only a few bits being received incorrectly in most blocks.

Question2

The Public Service Answering Point (PSAP) in San Francisco employs \(19\) operators in \(8\)-hour shifts to process \(911\) calls. There are at least \(5\) operators always answering calls. The number of calls processed per operator can be modeled with a Poisson random variable with rate \(\lambda_0 =20\) calls per hour.

Quesion 2 (a) What is the probability an operator can process 20 calls in an hour? Repeat for 30 calls in an hour?

#Answer 2a
lambda_0 <- 20

# Probability of processing 20 calls in an hour
prob_20_calls <- dpois(20, lambda_0)

# Probability of processing 30 calls in an hour
prob_30_calls <- dpois(30, lambda_0)

# Print the results
cat("Probability an operator can process 20 calls in an hour: ", prob_20_calls, "\n")

## Probability an operator can process 20 calls in an hour:  0.08883532

cat("Probability an operator can process 30 calls in an hour: ", prob_30_calls, "\n")

## Probability an operator can process 30 calls in an hour:  0.008343536

Quesion 2 (b) Given that 240 calls occurred in an hour, what is the probability that the ten operators can process them all assuming they are split equally among the operators?

#Answer 2b
lambda_0 <- 20
calls_per_operator <- 24

# Probability of an operator processing 24 calls in an hour
prob_24_calls <- dpois(calls_per_operator, lambda_0)

# Probability all ten operators process 24 calls each in an hour
prob_all_10_operators <- prob_24_calls^10

# Print the result
cat("Probability that the ten operators can process all 240 calls in an hour (split equally): ", prob_all_10_operators, "\n")

## Probability that the ten operators can process all 240 calls in an hour (split equally):  2.892317e-13

Quesion 2 (c) Now, If the aggregate call rate per hour \(\lambda_c\) , is measured by \(\lambda_c = 85\) calls per hour, what is the probability of more than \(100\) calls in an hour

#Answer 2c
lambda_c <- 85

# Cumulative probability of 0 to 100 calls
cumulative_prob_100 <- ppois(100, lambda_c)

# Probability of more than 100 calls
prob_more_than_100 <- 1 - cumulative_prob_100

# Print the result
cat("Probability of more than 100 calls in an hour: ", prob_more_than_100, "\n")

## Probability of more than 100 calls in an hour:  0.04934533

Question 3

For the two random number generator below A and B(don’t forget to add your R code) ### - [A] \(Z_i = (9Z_{i-1} + 1) \mod 16\) with \(Z_0 = 5\). ### - [B] \(Z_i = (7Z_{i-1} + 3) \mod 32\) with \(Z_0 = 10\),

Quesion 3 (a) Compute \(Z_i\) and \(U_i\) for values of \(i\) until a number is repeated, what is the period of both the generator? Provide your comments about the period of both RGN

#Answer 3a
generate_sequence <- function(a, c, m, Z0) {
  Z_values <- c(Z0)
  U_values <- c(Z0 / m)
  i <- 1
  
  while (TRUE) {
    Zi <- (a * Z_values[i] + c) %% m
    Ui <- Zi / m
    
    # If Zi value repeats, break
    if (Zi %in% Z_values) break
    
    Z_values <- c(Z_values, Zi)
    U_values <- c(U_values, Ui)
    
    i <- i + 1
  }
  
  return(list(Z=Z_values, U=U_values))
}

# For generator [A]
generator_A <- generate_sequence(a=9, c=1, m=16, Z0=5)
cat("Generator [A] - Zi values:", generator_A$Z, "\n")

## Generator [A] - Zi values: 5 14 15 8 9 2 3 12 13 6 7 0 1 10 11 4

cat("Generator [A] - Ui values:", generator_A$U, "\n")

## Generator [A] - Ui values: 0.3125 0.875 0.9375 0.5 0.5625 0.125 0.1875 0.75 0.8125 0.375 0.4375 0 0.0625 0.625 0.6875 0.25

# For generator [B]
generator_B <- generate_sequence(a=7, c=3, m=32, Z0=10)
cat("Generator [B] - Zi values:", generator_B$Z, "\n")

## Generator [B] - Zi values: 10 9 2 17 26 25 18 1

cat("Generator [B] - Ui values:", generator_B$U, "\n")

## Generator [B] - Ui values: 0.3125 0.28125 0.0625 0.53125 0.8125 0.78125 0.5625 0.03125

Quesion 3 (b) Which of these parameters effect the period of LCG – \(a\), \(b\), \(Z_0\)

#Answer 
cat("Answer: All the parameters - a (multiplier), c (increment), and Z0 (seed) - affect the sequence generated by the LCG and can influence its period. However, the maximum achievable period is primarily determined by the choices of a, c, and the modulus m, while Z0 determines where in the sequence we start.", "\n")

## Answer: All the parameters - a (multiplier), c (increment), and Z0 (seed) - affect the sequence generated by the LCG and can influence its period. However, the maximum achievable period is primarily determined by the choices of a, c, and the modulus m, while Z0 determines where in the sequence we start.

Quesion 3 (c) For both generators plot a scatter diagram of the Zi values 1 apart. What are your observations from these plots? What do you think about the property of randomness of these generators?

#Answer
library(ggplot2)

# Create scatter plot for a given sequence
scatter_plot <- function(Z_values, title) {
  data <- data.frame(Zi=Z_values[-length(Z_values)], Zi_next=Z_values[-1])
  plot <- ggplot(data, aes(x=Zi, y=Zi_next)) + 
    geom_point() + 
    theme_minimal() +
    labs(title=title, x="Zi", y="Zi+1")
  
  print(plot)
}

# Display scatter plots for both generators
scatter_plot(generator_A$Z, "Generator [A] - Lag Plot of Zi Values")

scatter_plot(generator_B$Z, "Generator [B] - Lag Plot of Zi Values")

Generator B’s scatter plot was evaluated for randomness properties essential for effective random number generation. The dispersion of points across the plane was observed to determine the presence of patterns or sequences; ideally, points should be randomly spread without discernible clusters or arrangements. For Generator B, a lack of visible patterns and sequences suggests strong randomness. Furthermore, the uniform distribution of its points indicates every potential number has an equal likelihood of being generated, a key criterion for random number generators. The absence of noticeable lines or periodic behaviors in the scatter plot reinforces its unpredictability, making it desirable for applications demanding randomness. While the scatter plot offers valuable visual insights into the generator’s behavior, additional statistical tests would further affirm its efficacy. In essence, Generator B exhibits promising randomness properties based on the visual assessment, making it a strong candidate for generating unpredictable sequences.

Quesion 3 (d) Randomness of default random generator of R – Run runif command to generate 100 random numbers and plot the scatter diagram of these numbers ( values 1 apart) and discuss your observations about randomness of this random generator.

#Answer
# Generate 100 random numbers using runif
random_numbers <- runif(100)

# Create a scatter plot for these numbers
library(ggplot2)

data <- data.frame(Ui=random_numbers[-length(random_numbers)], Ui_next=random_numbers[-1])

plot <- ggplot(data, aes(x=Ui, y=Ui_next)) + 
  geom_point() + 
  theme_minimal() +
  labs(title="Scatter Diagram of runif Generated Numbers", x="Ui", y="Ui+1")

print(plot)

Quesion 3 (e) Compute the mean value of \(U_i\) across the period

#Answer
mean_value <- mean(random_numbers)

cat("Mean value of Ui across the period:", mean_value, "\n")

## Mean value of Ui across the period: 0.4737322

Quesion 3 (f) By providing a plot of density (histogram) discuss the uniformity of both of the generators

#Answer
library(ggplot2)

# Histogram for Generator A
generator_A_data <- data.frame(Values = generator_A$Z)
plot_A <- ggplot(generator_A_data, aes(Values)) +  
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black", alpha = 0.7) +
  labs(title = "Histogram for Generator A", x = "Zi values", y = "Frequency") +
  theme_minimal()

print(plot_A)

# Histogram for Generator B
generator_B_data <- data.frame(Values =generator_B$Z) 
plot_B <- ggplot(generator_B_data, aes(Values)) +
  geom_histogram(binwidth = 1, fill = "lightcoral", color = "black", alpha = 0.7) +
  labs(title = "Histogram for Generator B", x = "Zi values", y = "Frequency") +
  theme_minimal()

print(plot_B)

For Generator A: It demonstrates good uniformity. The distribution is even across its entire range, making it a reliable random number generator for its modulus. For Generator B: It lacks uniformity. Despite having a larger modulus than Generator A, its limited range of outputs and the uneven spread of Ui values show that it isn’t uniformly generating numbers across its range. Generator A exhibits superior uniformity between the two generators compared to Generator B. While Generator A produces numbers that seem to be distributed uniformly across its entire range, Generator B has significant limitations in its range of outputs, leading to a lack of uniformity. Generator A is more unified.

Question 4.

Using the inverse transform method

Question 4.(a) Develop an algorithm for the random variable with cumulative distribution function F(x) below.

\[ F(x) = 1 - e^{-(x/\lambda)^k} \]

where \(x \geq 0\), \(\lambda \geq 0\), and \(k \geq 0\).

# Answer 4a
# This might be hard to print using R - you can write the step in your notebook and then include image here / or you can include as seperate file when upload

# Check if the IRdisplay package is already installed
if (!require(IRdisplay, quietly = TRUE)) {
  # If not installed, install it
  install.packages("IRdisplay")
  
  # Load the IRdisplay library
  library(IRdisplay)
} else {
  # If already installed, just load the library
  library(IRdisplay)
}

# Embed an image

The answer for 4(a) will be upload as a seperate file.

Quesion 4 (b) Use the first three Ui values from part (a) and generator [A] of previous problem to create 3 values from the random variable when, \(\lambda = 1\), \(𝑘 = 5\)

#Answer
# Define the inverse function
inverse_cdf <- function(U, lambda, k) {
  return(lambda * (-log(1 - U))^(1/k))
}

# Given U values
U <- c(0.3125, 0.875, 0.9375)

# Given parameters
lambda <- 1
k <- 5

# Calculate x values
x_values <- sapply(U, inverse_cdf, lambda = lambda, k = k)
x_values

## [1] 0.8217415 1.1576822 1.2262446

Quesion 4 (c) sing R generate 10,000 values from your algorithm when = 1 , 𝑘 = 5 and plot a histogram of the density. Discuss what insights you obtain when you look at the plot generated here vs the plot you generated in question 3 (e)

#Answer
# Required Libraries
library(ggplot2)

# Define the inverse function
inverse_cdf <- function(U, lambda, k) {
  return(lambda * (-log(1 - U))^(1/k))
}

# Parameters
lambda <- 1
k <- 5
n <- 10000

# Generate U values and then x values
U <- runif(n)
x_values <- sapply(U, inverse_cdf, lambda = lambda, k = k)

# Plot a histogram
ggplot(data.frame(x_values), aes(x=x_values)) +
  geom_histogram(aes(y=..density..), bins=50, fill="blue", alpha=0.7, color="black") +
  theme_minimal() +
  labs(title="Histogram of Generated Values", x="Value", y="Density")

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Highly differentiate from density plot of generator A and generator B, this plot is unimodal and normally distributed, which make this generator much more random than generator A and B.

Question 5

Develop a Monte Carlo simulation in R that counts the number of uniform [0,1] random numbers that must be summed to get a sum greater than 1. Run a single simulation with n = 10,000 times and find the mean of the number of counts. Do you think this number looks somewhat familiar. Every student might get slightly different value any ideas why?

#Answer
simulate_counts <- function(n) {
  counts <- numeric(n)
  for (i in 1:n) {
    sum <- 0
    count <- 0
    while (sum <= 1) {
      sum <- sum + runif(1)
      count <- count + 1
    }
    counts[i] <- count
  }
  return(counts)
}

# Run the simulation 10,000 times
n <- 10000
counts <- simulate_counts(n)

# Find the mean of the counts
mean_count <- mean(counts)
mean_count

## [1] 2.7179

The result might be close to the number e (approximately 2.71828…), which is the base of the natural logarithm. Every student getting a slightly different value is due to the randomness inherent in the Monte Carlo method. Each student’s simulation will generate different random numbers even though they all follow the same uniform distribution. Hence, the counts and the resulting mean will differ slightly between students.

Question 6

For our specific random variable, we know that its PDF is defined by the function \(f(x) = 10x(1-x)\). Utilize the accept-reject algorithm to draw samples from this distribution. Take a look at Lecture 10 slide 8 and 9

#Answer
# Define the target PDF
f <- function(x) {
  return(10*x*(1-x))
}

# Define the proposal distribution
g <- function(x) {
  return(1) # Uniform distribution
}

# Accept-reject sampling function
sample_from_f <- function(n) {
  samples <- numeric(n)
  c <- 2.5
  
  i <- 1
  while (i <= n) {
    # Step 2: Generate X from g
    X <- runif(1, 0, 1)
    
    # Step 3: Generate U from [0,1]
    U <- runif(1, 0, 1)
    
    # Step 4: Accept or Reject
    if (U <= f(X) / (c * g(X))) {
      samples[i] <- X
      i <- i + 1
    }
  }
  
  return(samples)
}

# Drawing samples
n <- 10000
samples <- sample_from_f(n)

# Visualize using a histogram
hist(samples, breaks=50, prob=TRUE, main="Histogram of Samples", xlab="x", ylab="Density")
curve(f, add=TRUE, col="red")

Homework 3

Yutong Liu