Problem Set 1

This week, we’ll empirically verify Central Limit Theorem. We’ll write code to run a small simulation on some distributions and verify that the results match what we expect from Central Limit Theorem. Please use R markdown to capture all your experiments and code. Please submit your Rmd file with your name as the filename.

(1) First write a function that will produce a sample of random variable that is distributed as follows:

f(x) = x, 0 <= x <= 1 & f(x) = 2 - x, 1 < x <= 2

pdf1 <- function() {
  # Get a random number x that is between 0 and 2
  x <- runif(1, min = 0, max = 2)
  
  # Check the function: this handles f(x) = 2 - x, 1 < x <= 2
  if (x > 1) {
    x <- 2 - x
  }
  return(x)
}

(2) Now, write a function that will produce a sample of random variable that is distributed as follows:

f(x) = 1 - x, 0 <= x <= 1 & f(x) = x - 1, 1 < x <= 2

pdf2 <- function() {
  # Get a random number x that is between 0 and 2
  x <- runif(1, min = 0, max = 2)
  
  # Check the function: this handles f(x) = 1 - x, 0 <= x <= 1
  if (x <= 1) {
    x <- 1 - x
  }
  # Check the function: this handles f(x) = x - 1, 1 < x <= 2
  else {
    x <- x - 1
  }
  return(x)
}

(3) Draw 1000 samples (call your function 1000 times each) from each of the above two distributions and plot the resulting histograms. You should have one histogram for each PDF. See that it matches your understanding of these PDFs.

# Run pdf1 1,000 times and then plot a histogram of the results
samp1 <- replicate(1000, pdf1(), simplify = TRUE)
hist(samp1, 20)

# Run pdf2 1,000 times and then plot a histogram of the results
samp2 <- replicate(1000, pdf2(), simplify = TRUE)
hist(samp2, 20)

(4) Now, write a program that will take a sample set size n as a parameter and the PDF as the second parameter, and perform 1000 iterations where it samples from the PDF, each time taking n samples and computes the mean of these n samples. It then plots a histogram of these 1000 means that it computes.

** This is clear as mud! **

mudd <- function(n, f) {
  # Run the PDF n times to get n samples, take the mean of the n samples, do this 1,000 times, then plot the means
  # I could NOT get the PDF to pass to this function as a function, so since I only had 2 I passed the number
  
  # This is for PDF 1
  if (f == 1) {
    means <- replicate(1000, {
    samp <- replicate(n, pdf1(), simplify = TRUE)
    mean(samp)
    })
  }
  # This is for PDF 2
  if (f == 2) {
    means <- replicate(1000, {
    samp <- replicate(n, pdf2(), simplify = TRUE)
    mean(samp)
    })
  }
  hist(means, 20)
}

mudd(25, 1)

(5) Verify that as you set n to something like 10 or 20, each of the two PDFs produce normally distributed mean of samples, empirically verifying the Central Limit Theorem. Please play around with various values of n and you’ll see that even for reasonably small sample sizes such as 10, Central Limit Theorem holds.

# Sample size of 10 for both PDF 1 and PDF 2
mudd(10, 1)

mudd(10, 2)

# Sample size of 20 for both PDF 1 and PDF 2
mudd(20, 1)

mudd(20, 2)

# Sample size of 30 for both PDF 1 and PDF 2
mudd(30, 1)

mudd(30, 2)

# Sample size of 40 for both PDF 1 and PDF 2
mudd(40, 1)

mudd(40, 2)

# Sample size of 50 for both PDF 1 and PDF 2
mudd(50, 1)

mudd(50, 2)

# Sample size of 100 for both PDF 1 and PDF 2
mudd(100, 1)

mudd(100, 2)