Cuny Data 605 - Assignment# 9

In this week assignment, we will explore simulation on distribution and illustrate the Central Limit theorem. There are various techniques to generate random sample of variables X that has a given distribution.

For Continuous Distribtions:

Inverse Transform Method
Composition method
Convolution method
Acceptance-Rejection method
Box-Muller method

For Discreet Distributions:

Discrete version of Inverse Transform Method

R offers useful commands for commonly employed distributions.
Functions which start with “p”,“q”,“d”, and “r” give the following:

the cumulative distribution function (CDF)
the inverse CDF
the density function (PDF)
random variable having the specified distribution
respectively.

Distribution Table in R

1. Function to produce random variables

First write a function that will produce a sample of random variable that is distributed as follows:
\(f(x)\quad =\quad x,\quad 0\quad \le \quad x\quad \le \quad 1\)
\(f(x)\quad =\quad 2\quad -\quad x,\quad 1\quad <\quad x\quad \le \quad 2\)

We will first plot the function f(x) over the interval [0,2]

x1 <- seq(0, 1, by=0.01)
x2 <- seq(1, 2, by=0.01)
x2 <- x2[-1]

fx1 <- function(x) (x)
fx2 <- function(x) (2-x)

y1 <- fx1(x1)
y2 <- fx2(x2)

x <- c(x1,x2)
y <- c(y1,y2)

plot(x,y)

This function represents Probability Density Function (pdf), f(x) is non-negative, the possible values are between 0 and 1. It remains to be shown that the integral of f(x) over the values [0,2] = 1.

From the graph, it is clear that the area under the triangle is equal to 1, since each side is 1/2 area of 1x1 square. Also, we can integrate f(x) and evaluate it over the interval [0,2].
f(x) is composed of 2 functions that are mutually exclusive, hence we are going to integrate each.

let us denote \({ f }_{ 1 }(x) = x\) and \({ f }_{ 2 }(x) = 2 - x\), where \({ f }_{ 1 }(x)\) is defined when \(0 \le x \le 1\) and \({ f }_{ 2 }(x)\) is defined when \(0 < x \le 1\).

Further more, let us denote \({ F }_{ 1 }(x) = \int { { f }_{ 1 } } (x)dx\) and \({ F }_{ 2 }(x)\ = \int { { f }_{ 2 } } (x)dx\).
By performing the integration we obtain:
\({ F }_{ 1 }(x) = \frac { 1 }{ 2 } { x }^{ 2 } + { c }_{ 1 }\) and \({ F }_{ 2 }(x) = 2x - \frac { 1 }{ 2 } { x }^{ 2 } +\ { c }_{ 2 }\)

We can determine the constants \({ c }_{ 1 }\) and \({ c }_{ 2 }\) knowing that \({ F }_{ 1 }(0) = 0\), hence \({ c }_{ 1 } = 0\) and \({ F }_{ 2 }(1) = \frac { 1 }{ 2 }\), hence \({ c }_{ 2 } = -1\)

Inegrating f(x) over [0,2] is equivalent to integrating \({ f }_{ 1 }(x)\) over [0,1] and \({ f }_{ 2 }(x)\) over [1,2].

Hence we get \({ { (F }_{ 1 } }(1)-{ F }_{ 1 }(0))+({ F }_{ 2 }(2)-{ F }_{ 2 }(1))=(\frac { 1 }{ 2 } -0)+(1-\frac { 1 }{ 2 } )=1\)

We could also evaluate integral using R.

integrate(fx1, 0, 1)

## 0.5 with absolute error < 5.6e-15

integrate(fx2, 1, 2)

## 0.5 with absolute error < 5.6e-15

To generate random variable with the appropriate distribution, we will make use of the inverse transform method. Hence, we need to find the inverse of \(F(x)\). We need therefore the find the inverse of \({ F }_{ 1 }(x)\) and \({ F }_{ 2 }(x)\).
Since \({ F }_{ 1 }(x) = \frac { 1 }{ 2 } { x }^{ 2 }\), finding the inverse of this function is, equating it to u and solving for u.

we obtain:
\(u=\frac { 1 }{ 2 } { x }^{ 2 }\quad \Leftrightarrow \quad 2u={ x }^{ 2 },\quad since\quad 0\le u\le \frac { 1 }{ 2 } ,\quad x=\sqrt { 2u }\)
since \(0\le x\le 1\), we do not need to write \(|x|\).

Since \({ F }_{ 2 }(x)=2x-\frac { 1 }{ 2 } { x }^{ 2 }-1\), finding the inverse of this function is equivalent to equating it to u and solving for u.

We obtain:
\(u=2x-\frac { 1 }{ 2 } { x }^{ 2 }-1 \quad \Leftrightarrow \quad -2u={ x }^{ 2 }-4x+2\)

Completing the square, we get:
\(-2u+2={ x }^{ 2 }-4x+4\quad \Leftrightarrow \quad -2u+2={ (x-2) }^{ 2 },\quad since\quad \frac { 1 }{ 2 } <u\le 1 ,\quad 2-2u>0,\quad hence\quad x-2=\sqrt { 2-2u } \quad \Leftrightarrow \quad x=\sqrt { 2-2u } +2\)
since for \(1<x\le 2\), \(x-2<0\), this give us \(x=2-\sqrt { 2(1-u) }\)

inverse_F <- function(u){
  if(u<=1/2){
    x<-sqrt(2*u)
  } else{
    x<-2-sqrt(2*(1-u))
  }
  return(x)
}

Let us draw a random sample of 1000 entries and plot the histogram.

u1 <- runif(1000, 0, 1)

x_sample1 <-rep(0,1000)

for (i in 1:1000) {
  x_sample1[i] <- inverse_F(u1[i])
}

hist(x_sample1)

Let us now consider the function define as follows:
\(f\left( x \right) =1-x,\quad for\quad 0\le x\le 1\) and
\(f\left( x \right) =x-1,\quad for\quad 1<x\le 2\)

This function represents Probability Density Function (pdf), f(x) is non-negative, the possible values are between 0 and 1. It remains to be shown that the integral of f(x) over the values [0,2] = 1. We will do so using r.

x1 <- seq(0, 1, by=0.01)
x2 <- seq(1, 2, by=0.01)
x2 <- x2[-1]

fx1 <- function(x) (1-x)
fx2 <- function(x) (x-1)

integrate(fx1, 0, 1)

## 0.5 with absolute error < 5.6e-15

integrate(fx2, 1, 2)

## 0.5 with absolute error < 5.6e-15

We will now look at the distribution of f(x).

y1 <- fx1(x1)
y2 <- fx2(x2)

x <- c(x1,x2)
y <- c(y1,y2)

plot(x,y)

To generate random variable with the appropriate distribution, we will make use of the Accept-Reject method. We will consider g(x)=1 on [0,2], hence, the constant c for which \(f\left( x \right)\le cg\left( x \right),\quad for\quad all\quad x\), is 1.

## Could not fixed the axis between the 2 graphs and the line is not being displayed for y = 1
#plot(x,rep(1, length(x)),xaxt="n",yaxt="n", type = 'l', col = 'red') 
#par(new = TRUE)
#plot(x,y, xaxt="n",yaxt="n",type = 'l', col = 'blue') 
#axis(side = 1, at=seq(0,2,0.5), lwd = 2) 
#axis(side = 2, at=seq(0,1.5,0.1), lwd = 2)

To apply the Accept-Reject Method, we need to performed the following stepts:
1. generate x for g(x) 2. generate u from U[0,1] (random uniform distribution) 3. \(if\quad u\le \frac { f\left( x \right) }{ cg\left( x \right) } ,\quad accept\quad x,\quad else\quad start\quad at\quad step\quad 1\)

We will write an r function for the above.

fx1 <- function(x) (1-x)
fx2 <- function(x) (x-1)
fx <- function(x){
  if(x<= 1){
    y<-fx1(x)
  } else{
    y<-fx2(x)
  }
  return(y)
}

u2 <- runif(5000, 0, 1)
u_x<- runif(5000, 0, 2)

x_sample2 <-rep(0,1000)

i<-1
j<-1
while (i <= 1000) {
  if(u2[j]<=fx(u_x[j])){
    x_sample2[i]<-u_x[j]
    i <- i+1
  }
  j<-j+1
}

hist(x_sample2)

From the plots, we can see that both histograms have the same distribution as the original PDF’s.

2. Central Limit Theorem

We will now set out to demostrate the Central Limit theorem by taking random samples of a given size n from the 2 distributions above.

First we will write a program that will take a sample set size n as a parameter and the PDF as the second parameter, and perform 1000 iterations where it samples from the PDF, each time taking n samples and computes the mean of these n samples. It then plots a histogram of these 1000 means that it computes.

clt_function <- function(n, y){
  #initialize 1000 elements array with 0 
  result<-rep(0, 1000)
  
  for (i in 1:1000){
      #take sample from input distribution y, of size n, with replacement
      i_sample <- sample(y, n, replace = TRUE)
      result[i]<-mean(i_sample) 
  }
  hist(result)
}

We will now call our function for various sample size ranging from 1 to 100 using both distributions.

par(mfrow = c(2,2))

clt_function(1, x_sample1)
clt_function(5, x_sample1)
clt_function(10, x_sample1)
clt_function(20, x_sample1)

clt_function(30, x_sample1)
clt_function(50, x_sample1)
clt_function(100, x_sample1)

clt_function(1, x_sample2)

clt_function(5, x_sample2)
clt_function(10, x_sample2)
clt_function(20, x_sample2)
clt_function(30, x_sample2)

clt_function(50, x_sample2)
clt_function(100, x_sample2)

Cuny Data 605 - Assignment# 9

Valerie Briot

November 2, 2016

1. Function to produce random variables

2. Central Limit Theorem