In this assignment, we will verify Central Limit Theorem with two different probability distributions. We will start the excercise by creating functions that simulate the two distributions. These functions will take a sample size as input and return one sample set of a specified length from the distribution.

\(\textbf{Distribution One}\)

\(f(x) = x, 0 \leq x \leq 1\)

\(f(x) = 2 - x, 1 < x \leq 2\)

distOne <- function(sampleSize){
  draw <- sample(seq(0,2,by=0.01),sampleSize,replace=TRUE,
                  prob=sapply(seq(0,2,by=0.01),function(x) { 
                    if (x >= 0 && x <= 2) {y <- ifelse(x <=1, x, 2-x)  } else {y <- 0 }}))
  return(draw)
}

# example set with 10 elements:

distOne(10)
##  [1] 1.17 0.81 1.25 0.49 0.36 0.55 1.64 0.76 0.63 1.30

\(\textbf{Distribution Two}\)

\(f(x) = 1 - x, 0 \leq x \leq 1\)

\(f(x) = x - 1, 1 < x \leq 2\)

distTwo <- function(sampleSize){
  draw <- sample(seq(0,2,by=0.01),sampleSize,replace=TRUE,
                 prob=sapply(seq(0,2,by=0.01),function(x) { 
                   if (x >= 0 && x <= 2) {y <- ifelse(x <=1, 1-x, x-1)  } else {y <- 0 }}))
  return(draw)
}

The following function will take in three parameters - sample size, the number of samples, and the distribution - and return save the sample sets in a matrix called “draws”. The “dist” parameter allows the function to call any preset probability distribution function by name. The function will sample the data and store each individual “set” in a column of a matrix.

drawSamples <- function(sampleSize,numSamples,dist){
  draws <- dist(sampleSize*numSamples) 
  draws <- matrix(draws,sampleSize) 
  return(draws)
}

We can see how this functions by drawing 12 samples of size 10 from the first probability distribution function:

drawSamples(10,12,distOne)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 0.47 1.58 1.17 1.12 0.66 1.01 1.18 0.42 1.07  1.20  0.84  1.11
##  [2,] 1.03 1.16 1.22 0.63 1.56 1.72 1.11 1.79 0.97  0.36  1.07  1.33
##  [3,] 1.24 1.01 1.43 0.49 1.19 0.54 0.93 0.96 0.83  1.56  1.27  0.43
##  [4,] 0.46 0.38 0.93 0.76 1.84 1.09 0.96 0.62 0.56  0.34  1.13  1.77
##  [5,] 1.03 0.19 1.09 0.53 0.78 1.33 0.91 1.83 0.39  0.66  0.86  0.15
##  [6,] 0.42 0.62 0.54 0.89 0.80 0.71 1.59 0.78 0.33  0.60  1.05  0.37
##  [7,] 1.02 1.27 0.93 0.72 0.78 0.53 1.30 0.90 0.95  0.79  0.40  1.12
##  [8,] 1.21 1.55 1.40 1.25 0.91 1.59 1.28 0.92 1.06  0.39  0.85  0.82
##  [9,] 1.11 1.33 1.28 1.22 1.10 0.81 1.54 0.63 1.01  1.22  0.93  1.20
## [10,] 1.12 0.83 1.38 0.99 1.41 0.74 0.63 0.57 1.00  1.57  1.46  1.18

Now we will draw 1000 sample sets from each distribution and plot the related histograms for each PDF. These histograms appear exactly as we would expect the shape of the distibutions to appear. We will plot the two linear functions that constitute each PDF over the range x = [0,2] in order to see how the sample values approximate the PDF.

p1 <- drawSamples(10,1000,distOne)
p2 <- drawSamples(10,1000,distTwo)

hist(p1,freq=F,xlab="X",main="Histogram of First Distribution",breaks=30,las=1)
abline(0,1,col="red")
text(0.5,0.7,"f(x) = x")
abline(2,-1,col="red")
text(1.6,0.7,"f(x) = 2 - x")

hist(p2,freq=F,xlab="X",main="Histogram of Second Distribution",breaks=30,las=1)
abline(1,-1,col="red")
text(0.6,0.7,"f(x) = x")
abline(-1,1,col="red")
text(1.4,0.7,"f(x) = 2 - x")

Now we will write a program that takes in parameters \(\textit{n}\) and \(\textit{dist}\) where \(\textit{n}\) is the desired sample size and \(\textit{dist}\) a probability distribution function. The function will draw 1000 sample sets of size \(\textit{n}\) and plot the distribution of the \(\textit{mean}\) of each set. We will see that for each probability distribution function, the means of the sets will be normally distribution and centered around the mean of the distribution. We will see that this verifies Central Limit Theorem even for sets with relative small sample size \(\textit{n}\).

plotPDF <- function(n,dist){
  draws = dist(n*1000) 
  draws = matrix(draws,n) 
  drawMeans <- apply(draws,2,mean) # Now we calculate the mean of each sample
  hist(drawMeans,freq=F,main=c("Sample Size",n),xlab=c("Mean of Sample Set"),breaks=30)
}

\(\textbf{Distribution One}\)

par(mfrow=c(2,2))
plotPDF(5,distOne)
plotPDF(10,distOne)
plotPDF(10,distOne)
plotPDF(1000,distOne)

\(\textbf{Distribution Two}\)

par(mfrow=c(2,2))
plotPDF(5,distTwo)
plotPDF(10,distTwo)
plotPDF(10,distTwo)
plotPDF(1000,distTwo)