Consider the Binomial distribution with \(n = 24\) and \(p = .9\) that is used to model the number of correctly received bits on a satellite link that transmits data in \(24\)-bit blocks.
#Answer 1a
# Parameters
n <- 24
p <- 0.9
# Possible number of successes (from 0 to 24)
k <- 0:24
# PDF
pdf <- dbinom(k, n, p)
# CDF
cdf <- pbinom(k, n, p)
# Plotting the PDF
par(mfrow=c(1,2)) # This makes the next two plots appear side-by-side
plot(k, pdf, type="h",
main="Probability Density Function (PDF)",
xlab="Number of correctly received bits",
ylab="Probability",
ylim=c(0, max(pdf)),
col="blue", lwd=2)
# Plotting the CDF
plot(k, cdf, type="s",
main="Cumulative Distribution Function (CDF)",
xlab="Number of correctly received bits",
ylab="Probability",
ylim=c(0,1),
col="red", lwd=2)
#Answer 1b
#(make sure that what you print make sense to me) - for example to print mean you can use cat command once you done with the calculation and have value in the variable
#variable_name=10
#cat("Mean number of correctly received bits ", variable_name)
# Parameters
n <- 24
p <- 0.9
# Calculate mean and standard deviation
mean_bits <- n * p
std_dev_bits <- sqrt(n * p * (1-p))
# Print the results
cat("Mean number of correctly received bits: ", mean_bits, "\n")
## Mean number of correctly received bits: 21.6
cat("Standard deviation of correctly received bits: ", std_dev_bits, "\n")
## Standard deviation of correctly received bits: 1.469694
#Answer 1c
# Parameters
n <- 24
p <- 0.9
q <- 1 - p
# Probability of 3 or fewer bit errors
prob_3_or_fewer_errors <- pbinom(3, n, q)
# Probability of more than 3 bit errors
prob_more_than_3_errors <- 1 - prob_3_or_fewer_errors
# Print the result
cat("Probability of more than 3-bit errors in the block of 24: ", prob_more_than_3_errors, "\n")
## Probability of more than 3-bit errors in the block of 24: 0.2142622
#Answer 1d
# Parameters
n <- 24
p <- 0.9
# Find the median
median_val <- qbinom(0.5, n, p)
# Print the result
cat("The median of the distribution is: ", median_val, "\n")
## The median of the distribution is: 22
If the median value is close to 24 (the maximum), the majority (more than half) of the 24-bit blocks will have the median value or fewer bits correctly received. Since p is relatively high (0.9), we expect the median to be close to 24.
If the median is 22, it means that more than half of the 24-bit blocks will have 22 or fewer bits correctly received, and the other half will have more than 22 bits correctly received. Given our high probability of success p, this would suggest that most of the time, almost all bits are correctly received, but there is still a substantial chance of getting a few errors.
#Answer 1e
# Parameters
n <- 24
p <- 0.9
# Find the 60th quantile
quantile_60 <- qbinom(0.6, n, p)
# Print the result
cat("The 60th quantile of the distribution is: ", quantile_60, "\n")
## The 60th quantile of the distribution is: 22
#Interpretation
cat("Interpretation: ", "The 60th quantile value of ", quantile_60, " means that 60% of the 24-bit blocks will have ", quantile_60, " or fewer bits correctly received. Given the high probability of success (", p, "), this value suggests that the majority of the transmissions are quite reliable, with only a few bits being received incorrectly in most blocks.", "\n")
## Interpretation: The 60th quantile value of 22 means that 60% of the 24-bit blocks will have 22 or fewer bits correctly received. Given the high probability of success ( 0.9 ), this value suggests that the majority of the transmissions are quite reliable, with only a few bits being received incorrectly in most blocks.
The Public Service Answering Point (PSAP) in San Francisco employs \(19\) operators in \(8\)-hour shifts to process \(911\) calls. There are at least \(5\) operators always answering calls. The number of calls processed per operator can be modeled with a Poisson random variable with rate \(\lambda_0 =20\) calls per hour.
#Answer 2a
lambda_0 <- 20
# Probability of processing 20 calls in an hour
prob_20_calls <- dpois(20, lambda_0)
# Probability of processing 30 calls in an hour
prob_30_calls <- dpois(30, lambda_0)
# Print the results
cat("Probability an operator can process 20 calls in an hour: ", prob_20_calls, "\n")
## Probability an operator can process 20 calls in an hour: 0.08883532
cat("Probability an operator can process 30 calls in an hour: ", prob_30_calls, "\n")
## Probability an operator can process 30 calls in an hour: 0.008343536
#Answer 2b
lambda_0 <- 20
calls_per_operator <- 24
# Probability of an operator processing 24 calls in an hour
prob_24_calls <- dpois(calls_per_operator, lambda_0)
# Probability all ten operators process 24 calls each in an hour
prob_all_10_operators <- prob_24_calls^10
# Print the result
cat("Probability that the ten operators can process all 240 calls in an hour (split equally): ", prob_all_10_operators, "\n")
## Probability that the ten operators can process all 240 calls in an hour (split equally): 2.892317e-13
#Answer 2c
lambda_c <- 85
# Cumulative probability of 0 to 100 calls
cumulative_prob_100 <- ppois(100, lambda_c)
# Probability of more than 100 calls
prob_more_than_100 <- 1 - cumulative_prob_100
# Print the result
cat("Probability of more than 100 calls in an hour: ", prob_more_than_100, "\n")
## Probability of more than 100 calls in an hour: 0.04934533
For the two random number generator below A and B(don’t forget to add your R code) ### - [A] \(Z_i = (9Z_{i-1} + 1) \mod 16\) with \(Z_0 = 5\). ### - [B] \(Z_i = (7Z_{i-1} + 3) \mod 32\) with \(Z_0 = 10\),
#Answer 3a
generate_sequence <- function(a, c, m, Z0) {
Z_values <- c(Z0)
U_values <- c(Z0 / m)
i <- 1
while (TRUE) {
Zi <- (a * Z_values[i] + c) %% m
Ui <- Zi / m
# If Zi value repeats, break
if (Zi %in% Z_values) break
Z_values <- c(Z_values, Zi)
U_values <- c(U_values, Ui)
i <- i + 1
}
return(list(Z=Z_values, U=U_values))
}
# For generator [A]
generator_A <- generate_sequence(a=9, c=1, m=16, Z0=5)
cat("Generator [A] - Zi values:", generator_A$Z, "\n")
## Generator [A] - Zi values: 5 14 15 8 9 2 3 12 13 6 7 0 1 10 11 4
cat("Generator [A] - Ui values:", generator_A$U, "\n")
## Generator [A] - Ui values: 0.3125 0.875 0.9375 0.5 0.5625 0.125 0.1875 0.75 0.8125 0.375 0.4375 0 0.0625 0.625 0.6875 0.25
# For generator [B]
generator_B <- generate_sequence(a=7, c=3, m=32, Z0=10)
cat("Generator [B] - Zi values:", generator_B$Z, "\n")
## Generator [B] - Zi values: 10 9 2 17 26 25 18 1
cat("Generator [B] - Ui values:", generator_B$U, "\n")
## Generator [B] - Ui values: 0.3125 0.28125 0.0625 0.53125 0.8125 0.78125 0.5625 0.03125
#Answer
cat("Answer: All the parameters - a (multiplier), c (increment), and Z0 (seed) - affect the sequence generated by the LCG and can influence its period. However, the maximum achievable period is primarily determined by the choices of a, c, and the modulus m, while Z0 determines where in the sequence we start.", "\n")
## Answer: All the parameters - a (multiplier), c (increment), and Z0 (seed) - affect the sequence generated by the LCG and can influence its period. However, the maximum achievable period is primarily determined by the choices of a, c, and the modulus m, while Z0 determines where in the sequence we start.
#Answer
library(ggplot2)
# Create scatter plot for a given sequence
scatter_plot <- function(Z_values, title) {
data <- data.frame(Zi=Z_values[-length(Z_values)], Zi_next=Z_values[-1])
plot <- ggplot(data, aes(x=Zi, y=Zi_next)) +
geom_point() +
theme_minimal() +
labs(title=title, x="Zi", y="Zi+1")
print(plot)
}
# Display scatter plots for both generators
scatter_plot(generator_A$Z, "Generator [A] - Lag Plot of Zi Values")
scatter_plot(generator_B$Z, "Generator [B] - Lag Plot of Zi Values")
Generator B’s scatter plot was evaluated for randomness properties essential for effective random number generation. The dispersion of points across the plane was observed to determine the presence of patterns or sequences; ideally, points should be randomly spread without discernible clusters or arrangements. For Generator B, a lack of visible patterns and sequences suggests strong randomness. Furthermore, the uniform distribution of its points indicates every potential number has an equal likelihood of being generated, a key criterion for random number generators. The absence of noticeable lines or periodic behaviors in the scatter plot reinforces its unpredictability, making it desirable for applications demanding randomness. While the scatter plot offers valuable visual insights into the generator’s behavior, additional statistical tests would further affirm its efficacy. In essence, Generator B exhibits promising randomness properties based on the visual assessment, making it a strong candidate for generating unpredictable sequences.
#Answer
# Generate 100 random numbers using runif
random_numbers <- runif(100)
# Create a scatter plot for these numbers
library(ggplot2)
data <- data.frame(Ui=random_numbers[-length(random_numbers)], Ui_next=random_numbers[-1])
plot <- ggplot(data, aes(x=Ui, y=Ui_next)) +
geom_point() +
theme_minimal() +
labs(title="Scatter Diagram of runif Generated Numbers", x="Ui", y="Ui+1")
print(plot)
#Answer
mean_value <- mean(random_numbers)
cat("Mean value of Ui across the period:", mean_value, "\n")
## Mean value of Ui across the period: 0.4737322
#Answer
library(ggplot2)
# Histogram for Generator A
generator_A_data <- data.frame(Values = generator_A$Z)
plot_A <- ggplot(generator_A_data, aes(Values)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black", alpha = 0.7) +
labs(title = "Histogram for Generator A", x = "Zi values", y = "Frequency") +
theme_minimal()
print(plot_A)
# Histogram for Generator B
generator_B_data <- data.frame(Values =generator_B$Z)
plot_B <- ggplot(generator_B_data, aes(Values)) +
geom_histogram(binwidth = 1, fill = "lightcoral", color = "black", alpha = 0.7) +
labs(title = "Histogram for Generator B", x = "Zi values", y = "Frequency") +
theme_minimal()
print(plot_B)
For Generator A: It demonstrates good uniformity. The distribution is even across its entire range, making it a reliable random number generator for its modulus. For Generator B: It lacks uniformity. Despite having a larger modulus than Generator A, its limited range of outputs and the uneven spread of Ui values show that it isn’t uniformly generating numbers across its range. Generator A exhibits superior uniformity between the two generators compared to Generator B. While Generator A produces numbers that seem to be distributed uniformly across its entire range, Generator B has significant limitations in its range of outputs, leading to a lack of uniformity. Generator A is more unified.
Using the inverse transform method
\[ F(x) = 1 - e^{-(x/\lambda)^k} \]
where \(x \geq 0\), \(\lambda \geq 0\), and \(k \geq 0\).
# Answer 4a
# This might be hard to print using R - you can write the step in your notebook and then include image here / or you can include as seperate file when upload
# Check if the IRdisplay package is already installed
if (!require(IRdisplay, quietly = TRUE)) {
# If not installed, install it
install.packages("IRdisplay")
# Load the IRdisplay library
library(IRdisplay)
} else {
# If already installed, just load the library
library(IRdisplay)
}
# Embed an image
The answer for 4(a) will be upload as a seperate file.
#Answer
# Define the inverse function
inverse_cdf <- function(U, lambda, k) {
return(lambda * (-log(1 - U))^(1/k))
}
# Given U values
U <- c(0.3125, 0.875, 0.9375)
# Given parameters
lambda <- 1
k <- 5
# Calculate x values
x_values <- sapply(U, inverse_cdf, lambda = lambda, k = k)
x_values
## [1] 0.8217415 1.1576822 1.2262446
#Answer
# Required Libraries
library(ggplot2)
# Define the inverse function
inverse_cdf <- function(U, lambda, k) {
return(lambda * (-log(1 - U))^(1/k))
}
# Parameters
lambda <- 1
k <- 5
n <- 10000
# Generate U values and then x values
U <- runif(n)
x_values <- sapply(U, inverse_cdf, lambda = lambda, k = k)
# Plot a histogram
ggplot(data.frame(x_values), aes(x=x_values)) +
geom_histogram(aes(y=..density..), bins=50, fill="blue", alpha=0.7, color="black") +
theme_minimal() +
labs(title="Histogram of Generated Values", x="Value", y="Density")
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Highly differentiate from density plot of generator A and generator B, this plot is unimodal and normally distributed, which make this generator much more random than generator A and B.
Develop a Monte Carlo simulation in R that counts the number of uniform [0,1] random numbers that must be summed to get a sum greater than 1. Run a single simulation with n = 10,000 times and find the mean of the number of counts. Do you think this number looks somewhat familiar. Every student might get slightly different value any ideas why?
#Answer
simulate_counts <- function(n) {
counts <- numeric(n)
for (i in 1:n) {
sum <- 0
count <- 0
while (sum <= 1) {
sum <- sum + runif(1)
count <- count + 1
}
counts[i] <- count
}
return(counts)
}
# Run the simulation 10,000 times
n <- 10000
counts <- simulate_counts(n)
# Find the mean of the counts
mean_count <- mean(counts)
mean_count
## [1] 2.7179
The result might be close to the number e (approximately 2.71828…), which is the base of the natural logarithm. Every student getting a slightly different value is due to the randomness inherent in the Monte Carlo method. Each student’s simulation will generate different random numbers even though they all follow the same uniform distribution. Hence, the counts and the resulting mean will differ slightly between students.
For our specific random variable, we know that its PDF is defined by the function \(f(x) = 10x(1-x)\). Utilize the accept-reject algorithm to draw samples from this distribution. Take a look at Lecture 10 slide 8 and 9
#Answer
# Define the target PDF
f <- function(x) {
return(10*x*(1-x))
}
# Define the proposal distribution
g <- function(x) {
return(1) # Uniform distribution
}
# Accept-reject sampling function
sample_from_f <- function(n) {
samples <- numeric(n)
c <- 2.5
i <- 1
while (i <= n) {
# Step 2: Generate X from g
X <- runif(1, 0, 1)
# Step 3: Generate U from [0,1]
U <- runif(1, 0, 1)
# Step 4: Accept or Reject
if (U <= f(X) / (c * g(X))) {
samples[i] <- X
i <- i + 1
}
}
return(samples)
}
# Drawing samples
n <- 10000
samples <- sample_from_f(n)
# Visualize using a histogram
hist(samples, breaks=50, prob=TRUE, main="Histogram of Samples", xlab="x", ylab="Density")
curve(f, add=TRUE, col="red")