Assignment09

1. Problem Set 1

This week, we’ll empirically verify Central Limit Theorem. We’ll write code to run a

small simulation on some distributions and verify that the results match what we expect

from Central Limit Theorem. Please use R markdown to capture all your experiments and

code. Please submit your Rmd file with your name as the filename.

1 First write a function that will produce a sample of random variable that is distributed

as follows:

f(x) = x, 0<= x<= 1

f(x) = 2 - x, 1 < x <= 2

That is, when your function is called, it will return a random variable between 0

and 2 that is distributed according to the above PDF. Please note that this is

not the same as writing a function and sampling uniformly from it. In

the online session this week, I’ll cover Sampling techniques. You will

find it useful when you do the assignment for this week. In addition,

as usual, there are one-liners in R that will give you samples from a

function. We’ll cover both of these approaches in the online session

Solution

We will follow the below steps outlined in our last online meetup:

Start with the CDF (cumulative distribution function)

Integral of the PDF

Invert it

Uniformly sample a value from [0,1]. Interpret as a probability

Use the inverse to find the corresponding value of the random variable

Given f (x) = x, x in [0, 1] and f (x) = 2 - x, x in [1, 2]

We get F(x), the CDF as

F(x) = square(x) x in [0, 1] and F(x) = 2x - square(x)/2 - 1, x in [1, 2]

The inverse function of the CDF as

F(y)inverse = sqrt(2y), y in [0, 0.5] and

F(y)inverse = 2 - sqrt(2(1 - y)), y in (0.5, 1].

invCDF <- function(y) {
  if (y >= 0 && y <= 2) {
    ret <- ifelse(y < 0.5,sqrt(2*y),2-sqrt(2*(1-y)))
  }
}

#

PDFcreator <- function(invCDF){
  return(sapply(runif(10000),invCDF))
}

random_sample1 <- PDFcreator(invCDF)

2 Now, write a function that will produce a sample of random variable that is distributed

as follows:

f(x) = 1 - x, 0 <= x <= 1

f(x) = x - 1, 1 < x <= 2

Solution

First, we will solve for 0 <= x <= 1

f(x) = 1 -x , 0 <= x <= 1

The integral F(x) is :

F(x) = x - (x^2) / 2 + 0

The inverse of F(x) is:

x - (x^2) / 2 = y

2x - (x^2) = 2y

Substruct 2*y from both sides:

2x - (x^2) - 2y = 2y - 2y

2x - (x^2) - 2y = 0

Solve 2x - (x^2) - 2y = 0

x = 1- sqrt(1 - 2y), x = sqrt(1-2y) + 1

Hence for 0 <= x <= 1, x = 1- sqrt(1 - 2*y)

Next, we will solve for 1 < x <= 2

f(x) = x -1 , 1 < x <= 2

The integral F(x) is:

((x^2) / 2 ) - x + 1

Inverse of F(x)

((x^2) / 2 ) - x + 1 = y

Multiply both sides by 2:

(x^2) - 2x + 2 = 2y

substruct 2y from both sides ##### (x^2) - 2x + 2 - 2*y = 0

solve (x^2) - 2x + 2 - 2y = 0

x= 1 + sqrt(2y - 1) or x = 1 - sqrt(2y - 1)

Hence, for 1 < x <= 2, x= 1 + sqrt(2*y - 1)

Therefore, our R function is:

invCDF2 <- function(y){
    if(y >= 0 && y <= 2){
      ret <- ifelse(y < 0.5, 1-sqrt(1-2*y), 1+sqrt(2*y - 1))
    }
  }

random_sample2 <- PDFcreator(invCDF2)

3. Draw 1000 samples (call your function 1000 times each) from each of the above two

distributions and plot the resulting histograms. You should have one histogram for

each PDF. See that it matches your understanding of these PDFs

library(ggplot2)

qplot(random_sample1)

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(random_sample2)

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

4 Now, write a program that will take a sample set size n as a parameter and the

PDF as the second parameter, and perform 1000 iterations where it samples from

the PDF, each time taking n samples and computes the mean of

qplot(replicate(1000,mean(PDFcreator(invCDF))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## Warning: position_stack requires constant width: output may be incorrect

qplot(replicate(1000,mean(PDFcreator(invCDF2))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## Warning: position_stack requires constant width: output may be incorrect

5. Verify that as you set n to something like 10 or 20, each of the two PDFs produce

normally distributed mean of samples, empirically verifying the Central Limit

Theorem. Please play around with various values of n and you’ll see that even for

reasonably small sample sizes such as 10, Central Limit Theorem holds.

qplot(replicate(10,mean(PDFcreator(invCDF))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(replicate(20,mean(PDFcreator(invCDF))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## Warning: position_stack requires constant width: output may be incorrect

qplot(replicate(200,mean(PDFcreator(invCDF))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## Warning: position_stack requires constant width: output may be incorrect

qplot(replicate(10,mean(PDFcreator(invCDF2))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## Warning: position_stack requires constant width: output may be incorrect

qplot(replicate(20,mean(PDFcreator(invCDF2))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

## Warning: position_stack requires constant width: output may be incorrect

qplot(replicate(200,mean(PDFcreator(invCDF2))))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Validating using the one liner …

nx <- sample(seq(0,2,by=0.01),10000,replace=TRUE,
             prob=sapply(seq(0,2,by=0.01),function(x) { if (x > 0 && x <= 2) 
             { y <- ifelse(x <=1, 1-x, x-1) } else {y <- 0 }}))

hist(nx,30,freq=F)

invCDF2 <- function(y){
  if(y >= 0 && y <= 2){
    ret <- ifelse(y < 0.5, 1-sqrt(1-2*y), 1+sqrt(2*y - 1))
  }
}

random_sample2 <- PDFcreator(invCDF2)
qplot(random_sample2)

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Assignment09

Mohamed Elmoudni

October 25, 2015

1. Problem Set 1

This week, we’ll empirically verify Central Limit Theorem. We’ll write code to run a

small simulation on some distributions and verify that the results match what we expect

from Central Limit Theorem. Please use R markdown to capture all your experiments and

code. Please submit your Rmd file with your name as the filename.

1 First write a function that will produce a sample of random variable that is distributed

as follows:

f(x) = x, 0<= x<= 1

f(x) = 2 - x, 1 < x <= 2

That is, when your function is called, it will return a random variable between 0

and 2 that is distributed according to the above PDF. Please note that this is

not the same as writing a function and sampling uniformly from it. In

the online session this week, I’ll cover Sampling techniques. You will

find it useful when you do the assignment for this week. In addition,

as usual, there are one-liners in R that will give you samples from a

function. We’ll cover both of these approaches in the online session

Solution

We will follow the below steps outlined in our last online meetup:

Start with the CDF (cumulative distribution function)

Integral of the PDF

Invert it

Uniformly sample a value from [0,1]. Interpret as a probability

Use the inverse to find the corresponding value of the random variable

Given f (x) = x, x in [0, 1] and f (x) = 2 - x, x in [1, 2]

We get F(x), the CDF as

F(x) = square(x) x in [0, 1] and F(x) = 2x - square(x)/2 - 1, x in [1, 2]

The inverse function of the CDF as

F(y)inverse = sqrt(2y), y in [0, 0.5] and

F(y)inverse = 2 - sqrt(2(1 - y)), y in (0.5, 1].

2 Now, write a function that will produce a sample of random variable that is distributed

as follows:

f(x) = 1 - x, 0 <= x <= 1

f(x) = x - 1, 1 < x <= 2

Solution

First, we will solve for 0 <= x <= 1

f(x) = 1 -x , 0 <= x <= 1

The integral F(x) is :

F(x) = x - (x^2) / 2 + 0

The inverse of F(x) is:

x - (x^2) / 2 = y

2x - (x^2) = 2y

Substruct 2*y from both sides:

2x - (x^2) - 2y = 2y - 2y

2x - (x^2) - 2y = 0

Solve 2x - (x^2) - 2y = 0

x = 1- sqrt(1 - 2y), x = sqrt(1-2y) + 1

Hence for 0 <= x <= 1, x = 1- sqrt(1 - 2*y)

Next, we will solve for 1 < x <= 2

f(x) = x -1 , 1 < x <= 2

The integral F(x) is:

((x^2) / 2 ) - x + 1

Inverse of F(x)

((x^2) / 2 ) - x + 1 = y

Multiply both sides by 2:

(x^2) - 2x + 2 = 2y

substruct 2y from both sides ##### (x^2) - 2x + 2 - 2*y = 0

solve (x^2) - 2x + 2 - 2y = 0

x= 1 + sqrt(2y - 1) or x = 1 - sqrt(2y - 1)

Hence, for 1 < x <= 2, x= 1 + sqrt(2*y - 1)

Therefore, our R function is:

3. Draw 1000 samples (call your function 1000 times each) from each of the above two

distributions and plot the resulting histograms. You should have one histogram for

each PDF. See that it matches your understanding of these PDFs

4 Now, write a program that will take a sample set size n as a parameter and the

PDF as the second parameter, and perform 1000 iterations where it samples from

the PDF, each time taking n samples and computes the mean of

5. Verify that as you set n to something like 10 or 20, each of the two PDFs produce

normally distributed mean of samples, empirically verifying the Central Limit

Theorem. Please play around with various values of n and you’ll see that even for

reasonably small sample sizes such as 10, Central Limit Theorem holds.

Validating using the one liner …