Question [15 Points]

When \(F(x)\) has a simple form and the pdf \(f(x)\) is available, random variables with density \(f(x)\) can be generated by using the ITM method, otherwise we use the rejection method. We know that the Gamma density has a distribution function \[f(x)=Beta(\alpha, \beta)=\frac{\Gamma{(\alpha+\beta)}}{\Gamma{(\alpha)}\Gamma{(\beta)}}x^{\alpha-1}{(1-x)^{(\beta-1)}}, ~~~\alpha, \beta , x\ge 0\]

  1. [3 Marks] Deduce the distribution function \(f(x)=Beta(2, 4)\)
  2. [5 Marks] Assume the function \(g(x)=1\) with \(0<x<1\) for both \(f(x)\) and \(g(x)\). Find the smallest constant c such that \(\frac{f(x)}{g(x)}\le c\).
  3. [3 Marks] Use ARM to generate 1000 random numbers having \(f(x)=Beta(2, 4)\).
  4. [2 Marks] Summarize the generated random numbers as a distribution table.
  5. [2 Marks] Explain why you think the generated data is really a probability distribution.

Solution

  1. Deduce the distribution function \(f(x)=Beta(2, 4)\)

\[f(x)=Beta(2, 4)=\frac{\Gamma{(2+4)}}{\Gamma{(2)}\Gamma{(4)}}x^{2-1}{(1-x)^{(4-1)}}~~=~~20x(1-x)^3,~~ 0\le x \le 1\] \[f(x)~~=~~20x(1-x)^3,~~~~ x \ge 0\]

  1. Assume the function \(g(x)=1\) with \(0<x<1\) for both \(f(x)\) and \(g(x)\). Find the smallest constant c such that \(\frac{f(x)}{g(x)}\le c\).

We note that: \[f(x)~~=~~20x(1-x)^3,~~~~ x \ge 0\] \[g(x)~~=~~1,~~~~ x \ge 0\] \[\frac{d\left(\frac{f(x)}{g(x)}\right)}{dx}=0=\frac{d(20x(1-x)^3)}{dx} \longrightarrow x=\frac{1}{4}\] Therefore, the minimum value \(c\) is obtained by substituting \(x=\frac{1}{4}\) in:

\[\frac{f(x)}{g(x)} = \frac{20x(1-x)^3}{1}=2.109375\] Thus, \[\frac{f(x)}{g(x)} \le 2.109375\]

Hence, \[\frac{f(x)}{cg(x)} =\frac{f(x)}{ 2.109375g(x)}=9.481481x(1-x)^3\]

  1. Use ARM to generate 1000 random numbers having \(f(x)=Beta(2, 4)\).
# General function

    #set.seed(123)
    
    areject <- function(f.fun, C, g.fun, rg, n) {
      naccept <- 0
      ran.sample <- rep(NA, n)
    
      while (naccept < n) {
        y <- rg(1)
        u <- runif(1)
    
        if ( u <= f.fun(y) / (C*g.fun(y)) ) {
          naccept <- naccept + 1
          ran.sample[naccept] = y
        }
      }
    
      return(ran.sample)
    }

# Function Call
    
    f.fun  <- function(x) 20*(x^3)*(1-x)^3
    g.fun  <- function(x) 1
    rg <- runif
    C  <- 2.109375 #upper bound on f (via basic calculus)
    
    n  <- 1000
    result <- areject(f.fun, C, g.fun, rg, n)
    result <- matrix(result, ncol=10)
  1. Summarize the generated random numbers.
a <- 2
b <- 4

emean <- a/(a+b)
evar  <- (a*b)/((a+b)^2*(a+b+1))
emode <- (a-1)/(a+b-2)

smean  <- mean(result)
ssd   <- sd(result)^2

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

smode <- getmode(result)

# Histogram Plot
    hist(result, freq = FALSE)
    points( seq(0, 1, 0.01), dbeta( seq(0, 1, 0.01), 2, 4), type = "l")

Empirically, remember that for \(Beta(2,4)\):

\[E(X)=\frac{\alpha}{(\alpha+\beta)}= 0.3333333\]

ARM simulation estimation for \(Beta(2,4)\):

  1. Explain why you think the generated data is really a probability distribution.
# P(X<=1) = ?
    x  <- 1
    p1  <- length(result[result <= x])/n
    cat("Probaility X is at least 1 (P(X<=1) is: ", p1)
## Probaility X is at least 1 (P(X<=1) is:  1
# P(0<=X<=1) = ?
    x1  <- 0
    x2  <- 1
    p2  <- length(result[result <= x2])/n - length(result[result <= x1])/n 
    cat("\nProbaility X is at least 1 (P(X<=1) is: ", p2)
## 
## Probaility X is at least 1 (P(X<=1) is:  1
# P(X<=0) = ?
    x  <- 0
    p3  <- length(result[result < x])/n
    cat("\nProbaility X is less than 0 (P(X< 0) is: ", p3)   
## 
## Probaility X is less than 0 (P(X< 0) is:  0
# P(X>=1) = ?
    x  <- 1
    p4  <- length(result[result > x])/n
    cat("\nProbaility X is greater than 1 (P(X> 1) is: ", p4)   
## 
## Probaility X is greater than 1 (P(X> 1) is:  0

The generated data is likely to be a probability distribution because it satisfies the following properties:

  1. The values are non-negative: All the generated numbers are non-negative since the Beta distribution has a probability density of 0 for negative numbers.

  2. The sum of values is 1: The Beta distribution is a probability distribution, and the (ARM) generates samples that follow this distribution.

  3. The values are within the support of the distribution: The (ARM) generates values that follow the Beta distribution within its support, which is between 0 and 1.

Question [15 Points]

  1. [3 Marks] State the algorithm for the Acceptance Rejection Method.
  2. [5 Marks] Write an R function to generate a sample of n observations from a distribution using the Acceptance-Rejection Method.
  3. [3 Marks] Use the R function to generate 100 numbers having the distribution in Table
    X 1.00 2.00 3.0 4.00 5.00
    Px 0.15 0.35 0.3 0.15 0.05
  4. [2 Marks] Summarize the generated random numbers as a distribution table.
  5. [2 Marks] Explain why you think the generated data is really a probability distribution.

solution

  1. State the algorithm for the Acceptance Rejection Method.

ARM ALGORITHM Simplified

ARM ALGORITHM detailed

ALGORITHM:

  1. Write an R function to generate a sample of n observations from a distribution using the Acceptance-Rejection Method.
arm.fun  <- function(prob.x, C, prob.y, n.gen) {   
    C    <- max(prob.x/prob.y)
    sim.vector <- rep(0, n.gen)
    for(j in 1:n.gen){
          u1 <- runif(1)
          y  <- floor(5*u1) + 1     #Generating Y from discrete Uniform[1,5] with pmf = 1/5
          k  <- (prob.y[y]) * C
          u2 <- runif(1)
          while(u2 > prob.x[y]/k){  #If rejection, it will search for the next possible value.
                    u1 <- runif(1)
                    y  <- floor(5*u1) + 1
                    u2 <- runif(1)  #Needed inside the while loop
          }
    sim.vector[j] <- y
    }
    return(sim.vector)
}
  1. Use the R function to generate 100 numbers having the distribution in Table
    X 1.00 2.00 3.0 4.00 5.00
    Px 0.15 0.35 0.3 0.15 0.05
    set.seed(123)
    n.gen  <- 100
    prob.x <- c(0.15, 0.35, 0.30, 0.15, 0.05)
    prob.y <- rep((1/5),5)

    x.ARM  <- arm.fun(prob.x, C, prob.y, n.gen)
    x.ARM  <- matrix(x.ARM, 10)

ARM Sample Data

    kableExtra::kable(x.ARM) %>% 
      kable_styling(bootstrap_options = "striped", full_width = T, position = "center", fixed_thead = FALSE)
2 3 2 4 2 4 3 2 4 4
5 1 2 3 2 2 2 2 2 2
3 2 2 4 1 4 2 2 1 2
2 2 3 2 3 3 2 3 1 1
2 4 2 2 3 2 1 3 2 5
3 3 2 1 3 2 3 1 5 2
2 4 4 3 4 3 4 3 4 4
4 2 3 2 2 2 3 2 1 2
2 4 3 1 4 2 2 5 3 3
1 3 3 2 3 1 3 1 2 3
  1. Summarize the generated random numbers as a distribution table.
    f_ARM           <- table(x.ARM)/length(x.ARM)
    f_ARM           <- rbind(1:5, f_ARM)
    f_ARM           <- t(f_ARM)
    colnames(f_ARM) <- c("X", "PX")
    
    kableExtra::kable(f_ARM) %>% 
      kable_styling(bootstrap_options = "striped", full_width = T, position = "center", fixed_thead = FALSE)    
X PX
1 0.13
2 0.40
3 0.27
4 0.16
5 0.04
  1. [2 Marks] Explain why you think the generated data is really a probability distribution.

Check for:

\[0 \le P(x_i) \le 1\] \[\sum_{i} P(x_i)=1\] \[Simple~~Plot\]

    f_ARM           <- table(x.ARM)/length(x.ARM)
    f_ARM           <- rbind(1:5, f_ARM)
    f_ARM           <- t(f_ARM)
    colnames(f_ARM) <- c("X", "PX")
    f_ARM           <- data.frame(f_ARM)
    kable(f_ARM)
X PX
1 0.13
2 0.40
3 0.27
4 0.16
5 0.04
    attach(f_ARM)
    barplot(PX~X, col="blue")

\[Good ~ Luck\]