DATA 605: Assignment 9

Central Limit Theorm

(1) First write a function that will produce a sample of random variable that is distributed as follows:

\[ f(x) =x, 0 \leq x \leq 1 \]

\[ f(x) = 2 - x, 1 < x \leq 2 \]

fx1 <- function(x) {
  if (x >= 0 && x <= 2) return (ifelse(x > 1, 2 - x, x))
}

(2) Now, write a function that will produce a sample of random variable that is distributed as follows:

\[ f(x) = 1 - x, 0 \leq x \leq 1 \] \[ f(x) = x - 1, 1 < x \leq 2 \]

fx2 <- function(x) {
  if (x >= 0 && x <= 2) return (ifelse(x > 1, x - 1, 1 - x))
}

(3) Draw 1000 samples (call your function 1000 times each) from each of the above two distributions and plot the resulting histograms. You should have one histogram for each PDF. See that it matches your understanding of these PDFs

# first generate a sequence of values between 0 and 2

n <- seq(from = 0, to = 2, length.out = 10000)

# apply function1 to the seqence and capture the resulting probabilities

prob1 <- sapply(n, fx1)

# lets look at the some of the probability values calculated by the first PDF
head(prob1, 10)

##  [1] 0.00000000 0.00020002 0.00040004 0.00060006 0.00080008 0.00100010
##  [7] 0.00120012 0.00140014 0.00160016 0.00180018

# apply function1 to the seqence and capture the resulting probabilities

prob2 <- sapply(n, fx2)

# lets look at the some of the probability values calculated by the second PDF
head(prob2, 10)

##  [1] 1.0000000 0.9998000 0.9996000 0.9993999 0.9991999 0.9989999 0.9987999
##  [8] 0.9985999 0.9983998 0.9981998

Note. If we plot the PDF probabilities returned from the PDF function in a histogram, we can see that the resulting plot is not showing an expected distribution.

hist(prob1)

Consequently, we need to sample from the function using the calculated probabilities:

pdf1 <- sample(n, 1000, replace=TRUE, prob=prob1)
hist(pdf1)

pdf2 <- sample(n, 1000, replace=TRUE, prob=prob2)
hist(pdf2)

(4) Now, write a program that will take a sample set size n as a parameter and the PDF as the second parameter, and perform 1000 iterations where it samples from the PDF, each time taking n samples and computes the mean of these n samples. It then plots a histogram of these 1000 means that it computes.

PDF_plot <- function(n, PDF, normal_curve = FALSE) {

  # create a vector of 1000 values between 0 and 2
  vec <- seq(from = 0, to = 2, length.out = 1000)
  
  # Calculate the probabilities using the given PDF.  This is done by applying the supplied
  # PDF to the vector of 1000 values between 0 and 2
  probs <- sapply(vec, PDF)
  
  # perform 1000 iterations, taking n number of samples
  sampled_means <- sapply(1:1000, function(x) mean(sample(vec, n, replace=TRUE, prob = probs)))
  
  # generate a histogram of the sampled means
  h <- hist(sampled_means, main='Distribution of Sampled Means', xlab='Sampled Mean')
  
  # include an option to add a normal curve to the histogram
  if (normal_curve) {
   
      #http://www.statmethods.net/graphs/density.html   
      x <- sampled_means 
      xfit<- seq(min(x), max(x),length=40) 
      yfit<-dnorm(xfit,mean=mean(x),sd=sd(x)) 
      yfit <- yfit*diff(h$mids[1:2])*length(x) 
      lines(xfit, yfit, col="blue", lwd=2)
      
  }
  
  return (mean(sampled_means))
  
}

Call the PDF_plot function using sample size of 10 and the first PDF function fx1.

m1 <- PDF_plot(10, PDF=fx1)

Call the PDF_plot function using sample size of 10 and the second PDF function fx2:

m2 <- PDF_plot(10, PDF=fx2)

(5) Verify that as you set n to something like 10 or 20, each of the two PDFs produce normally distributed mean of samples, empirically verifying the Central Limit Theorem. Please play around with various values of n and you’ll see that even for reasonably small sample sizes such as 10, Central Limit Theorem holds.

We’ll adjust the value of n from 10 to 20 to show the two PDFs produce normally distributed mean of samples. Additionally, to verify the distributions are consistent with the Central Limit Theorem, we’ll overlay a normal curve on top of the histogram.

PDF1 using sample set size of 10:

pdf1_10 <- PDF_plot(10, PDF=fx1, normal_curve = TRUE)

PDF1 using sample set size of 20:

pdf1_20 <- PDF_plot(20, PDF=fx1, normal_curve =TRUE)

PDF2 using sample set size of 10:

pdf2_10 <- PDF_plot(10, PDF=fx2, normal_curve =TRUE)

PDF2 using sample set size of 20:

pdf2_10 <- PDF_plot(20, PDF=fx2, normal_curve =TRUE)