Department of Statistics [SP2023 Semester]

Question [15 Points]

When \(F(x)\) has a simple form and the pdf \(f(x)\) is available, random variables with density \(f(x)\) can be generated by using the ITM method, otherwise we use the rejection method. We know that the Gamma density has a distribution function \[f(x)=Beta(\alpha, \beta)=\frac{\Gamma{(\alpha+\beta)}}{\Gamma{(\alpha)}\Gamma{(\beta)}}x^{\alpha-1}{(1-x)^{(\beta-1)}}, ~~~\alpha, \beta , x\ge 0\]

[3 Marks] Deduce the distribution function \(f(x)=Beta(2, 4)\)
[5 Marks] Assume the function \(g(x)=1\) with \(0<x<1\) for both \(f(x)\) and \(g(x)\). Find the smallest constant c such that \(\frac{f(x)}{g(x)}\le c\).
[3 Marks] Use ARM to generate 1000 random numbers having \(f(x)=Beta(2, 4)\).
[2 Marks] Summarize the generated random numbers as a distribution table.
[2 Marks] Explain why you think the generated data is really a probability distribution.

Solution

Deduce the distribution function \(f(x)=Beta(2, 4)\)

\[f(x)=Beta(2, 4)=\frac{\Gamma{(2+4)}}{\Gamma{(2)}\Gamma{(4)}}x^{2-1}{(1-x)^{(4-1)}}~~=~~20x(1-x)^3,~~ 0\le x \le 1\] \[f(x)~~=~~20x(1-x)^3,~~~~ x \ge 0\]

Assume the function \(g(x)=1\) with \(0<x<1\) for both \(f(x)\) and \(g(x)\). Find the smallest constant c such that \(\frac{f(x)}{g(x)}\le c\).

We note that: \[f(x)~~=~~20x(1-x)^3,~~~~ x \ge 0\] \[g(x)~~=~~1,~~~~ x \ge 0\] \[\frac{d\left(\frac{f(x)}{g(x)}\right)}{dx}=0=\frac{d(20x(1-x)^3)}{dx} \longrightarrow x=\frac{1}{4}\] Therefore, the minimum value \(c\) is obtained by substituting \(x=\frac{1}{4}\) in:

\[\frac{f(x)}{g(x)} = \frac{20x(1-x)^3}{1}=2.109375\] Thus, \[\frac{f(x)}{g(x)} \le 2.109375\]

Hence, \[\frac{f(x)}{cg(x)} =\frac{f(x)}{ 2.109375g(x)}=9.481481x(1-x)^3\]

Use ARM to generate 1000 random numbers having \(f(x)=Beta(2, 4)\).

# General function

    #set.seed(123)
    
    areject <- function(f.fun, C, g.fun, rg, n) {
      naccept <- 0
      ran.sample <- rep(NA, n)
    
      while (naccept < n) {
        y <- rg(1)
        u <- runif(1)
    
        if ( u <= f.fun(y) / (C*g.fun(y)) ) {
          naccept <- naccept + 1
          ran.sample[naccept] = y
        }
      }
    
      return(ran.sample)
    }

# Function Call
    
    f.fun  <- function(x) 20*(x^3)*(1-x)^3
    g.fun  <- function(x) 1
    rg <- runif
    C  <- 2.109375 #upper bound on f (via basic calculus)
    
    n  <- 1000
    result <- areject(f.fun, C, g.fun, rg, n)
    result <- matrix(result, ncol=10)

Summarize the generated random numbers.

a <- 2
b <- 4

emean <- a/(a+b)
evar  <- (a*b)/((a+b)^2*(a+b+1))
emode <- (a-1)/(a+b-2)

smean  <- mean(result)
ssd   <- sd(result)^2

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

smode <- getmode(result)

# Histogram Plot
    hist(result, freq = FALSE)
    points( seq(0, 1, 0.01), dbeta( seq(0, 1, 0.01), 2, 4), type = "l")

Empirically, remember that for \(Beta(2,4)\):

The Mean:

\[E(X)=\frac{\alpha}{(\alpha+\beta)}= 0.3333333\]

The Standard Deviation: \[sd(X)=\sqrt{\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}} = 0.031746\]
The Mode: \[Mode(X)=\frac{\alpha-1}{(\alpha+\beta-2)}= 0.25\]

ARM simulation estimation for \(Beta(2,4)\):

The Mean \(0.4908253\)
The Standard Deviation \(0.028326\)
The Mode \(0.4374527\)

Explain why you think the generated data is really a probability distribution.

Just Check the probability assumptions

# P(X<=1) = ?
    x  <- 1
    p1  <- length(result[result <= x])/n
    cat("Probaility X is at least 1 (P(X<=1) is: ", p1)

## Probaility X is at least 1 (P(X<=1) is:  1

# P(0<=X<=1) = ?
    x1  <- 0
    x2  <- 1
    p2  <- length(result[result <= x2])/n - length(result[result <= x1])/n 
    cat("\nProbaility X is at least 1 (P(X<=1) is: ", p2)

## 
## Probaility X is at least 1 (P(X<=1) is:  1

# P(X<=0) = ?
    x  <- 0
    p3  <- length(result[result < x])/n
    cat("\nProbaility X is less than 0 (P(X< 0) is: ", p3)

## 
## Probaility X is less than 0 (P(X< 0) is:  0

# P(X>=1) = ?
    x  <- 1
    p4  <- length(result[result > x])/n
    cat("\nProbaility X is greater than 1 (P(X> 1) is: ", p4)

## 
## Probaility X is greater than 1 (P(X> 1) is:  0

The generated data is likely to be a probability distribution because it satisfies the following properties:

The values are non-negative: All the generated numbers are non-negative since the Beta distribution has a probability density of 0 for negative numbers.
The sum of values is 1: The Beta distribution is a probability distribution, and the (ARM) generates samples that follow this distribution.
The values are within the support of the distribution: The (ARM) generates values that follow the Beta distribution within its support, which is between 0 and 1.

Question [15 Points]

[3 Marks] State the algorithm for the Acceptance Rejection Method.
[5 Marks] Write an R function to generate a sample of n observations from a distribution using the Acceptance-Rejection Method.
[3 Marks] Use the R function to generate 100 numbers having the distribution in Table

X 1.00 2.00 3.0 4.00 5.00

Px 0.15 0.35 0.3 0.15 0.05
[2 Marks] Summarize the generated random numbers as a distribution table.
[2 Marks] Explain why you think the generated data is really a probability distribution.

solution

State the algorithm for the Acceptance Rejection Method.

ARM ALGORITHM Simplified

Step 1: Simulate the value of Y, having the probability function \(q_j\)
Step 2: Generate a random number \(U\)
Step 3: If \(U<\frac{p_Y}{q_Y}\), set X=Y and STOP. Otherwise return to Step 1.

ARM ALGORITHM detailed

ALGORITHM:

1. Create a function with n as a parameter
1. Let probx be probabilities of (0.15, 0.35, 0.3 ,0.15, 0.05)
1. Let proby be based on discrete unif(0,1) so the proby is rep((1/5),5)
1. Let c be the max of probx/prby
1. Create empty vector simx = c()
1. Set.seed(123) to obtain same result at every simulation
1. Start the loop for(j in 1:n)
7.1 u1 = runif(1)
7.2 y = floor(5*u1)+1
7.3 u2 = runif(1)
7.4 k = (proby[y])*c to store (proby[y])c in variable k
7.5 while (u2 > probx[y]/k) that is when u2 < ratio assign x to y value then update the following
7.5.1 u1= runif(1)
7.5.2 y= floor(5*u1)+1
7.5.3 u2= runif(1)
7.6 end while loop
1. add y to simx[j]
1. end foor loop
1. display simx
1. end function

Write an R function to generate a sample of n observations from a distribution using the Acceptance-Rejection Method.

arm.fun  <- function(prob.x, C, prob.y, n.gen) {   
    C    <- max(prob.x/prob.y)
    sim.vector <- rep(0, n.gen)
    for(j in 1:n.gen){
          u1 <- runif(1)
          y  <- floor(5*u1) + 1     #Generating Y from discrete Uniform[1,5] with pmf = 1/5
          k  <- (prob.y[y]) * C
          u2 <- runif(1)
          while(u2 > prob.x[y]/k){  #If rejection, it will search for the next possible value.
                    u1 <- runif(1)
                    y  <- floor(5*u1) + 1
                    u2 <- runif(1)  #Needed inside the while loop
          }
    sim.vector[j] <- y
    }
    return(sim.vector)
}

Use the R function to generate 100 numbers having the distribution in Table

X 1.00 2.00 3.0 4.00 5.00

Px 0.15 0.35 0.3 0.15 0.05

    set.seed(123)
    n.gen  <- 100
    prob.x <- c(0.15, 0.35, 0.30, 0.15, 0.05)
    prob.y <- rep((1/5),5)

    x.ARM  <- arm.fun(prob.x, C, prob.y, n.gen)
    x.ARM  <- matrix(x.ARM, 10)

ARM Sample Data

    kableExtra::kable(x.ARM) %>% 
      kable_styling(bootstrap_options = "striped", full_width = T, position = "center", fixed_thead = FALSE)

2	3	2	4	2	4	3	2	4	4
5	1	2	3	2	2	2	2	2	2
3	2	2	4	1	4	2	2	1	2
2	2	3	2	3	3	2	3	1	1
2	4	2	2	3	2	1	3	2	5
3	3	2	1	3	2	3	1	5	2
2	4	4	3	4	3	4	3	4	4
4	2	3	2	2	2	3	2	1	2
2	4	3	1	4	2	2	5	3	3
1	3	3	2	3	1	3	1	2	3

Summarize the generated random numbers as a distribution table.

Probability Distribution Table

    f_ARM           <- table(x.ARM)/length(x.ARM)
    f_ARM           <- rbind(1:5, f_ARM)
    f_ARM           <- t(f_ARM)
    colnames(f_ARM) <- c("X", "PX")
    
    kableExtra::kable(f_ARM) %>% 
      kable_styling(bootstrap_options = "striped", full_width = T, position = "center", fixed_thead = FALSE)

X	PX
1	0.13
2	0.40
3	0.27
4	0.16
5	0.04

[2 Marks] Explain why you think the generated data is really a probability distribution.

Check for:

\[0 \le P(x_i) \le 1\] \[\sum_{i} P(x_i)=1\] \[Simple~~Plot\]

    f_ARM           <- table(x.ARM)/length(x.ARM)
    f_ARM           <- rbind(1:5, f_ARM)
    f_ARM           <- t(f_ARM)
    colnames(f_ARM) <- c("X", "PX")
    f_ARM           <- data.frame(f_ARM)
    kable(f_ARM)

X	PX
1	0.13
2	0.40
3	0.27
4	0.16
5	0.04

    attach(f_ARM)
    barplot(PX~X, col="blue")

Department of Statistics [SP2023 Semester]

STAT3336: Computational Techniques in Statistics - Quiz 4

Monday 8 May 2023: 10:00 am - 12:00 pm

Question [15 Points]

Solution

Empirically, remember that for \(Beta(2,4)\):

ARM simulation estimation for \(Beta(2,4)\):

Question [15 Points]

solution

ARM Sample Data

Check for:

\[Good ~ Luck\]

2	3	2	4	2	4	3	2	4	4
5	1	2	3	2	2	2	2	2	2
3	2	2	4	1	4	2	2	1	2
2	2	3	2	3	3	2	3	1	1
2	4	2	2	3	2	1	3	2	5
3	3	2	1	3	2	3	1	5	2
2	4	4	3	4	3	4	3	4	4
4	2	3	2	2	2	3	2	1	2
2	4	3	1	4	2	2	5	3	3
1	3	3	2	3	1	3	1	2	3

X	1.00	2.00	3.0	4.00	5.00
Px	0.15	0.35	0.3	0.15	0.05

X	1.00	2.00	3.0	4.00	5.00
Px	0.15	0.35	0.3	0.15	0.05

2	3	2	4	2	4	3	2	4	4
5	1	2	3	2	2	2	2	2	2
3	2	2	4	1	4	2	2	1	2
2	2	3	2	3	3	2	3	1	1
2	4	2	2	3	2	1	3	2	5
3	3	2	1	3	2	3	1	5	2
2	4	4	3	4	3	4	3	4	4
4	2	3	2	2	2	3	2	1	2
2	4	3	1	4	2	2	5	3	3
1	3	3	2	3	1	3	1	2	3

2	3	2	4	2	4	3	2	4	4
5	1	2	3	2	2	2	2	2	2
3	2	2	4	1	4	2	2	1	2
2	2	3	2	3	3	2	3	1	1
2	4	2	2	3	2	1	3	2	5
3	3	2	1	3	2	3	1	5	2
2	4	4	3	4	3	4	3	4	4
4	2	3	2	2	2	3	2	1	2
2	4	3	1	4	2	2	5	3	3
1	3	3	2	3	1	3	1	2	3