(1) First write a function that will produce a sample of random variable that is distributed as follows:
\[ f(x) =x, 0 \leq x \leq 1 \]
\[ f(x) = 2 - x, 1 < x \leq 2 \]
fx1 <- function(x) {
if (x >= 0 && x <= 2) return (ifelse(x > 1, 2 - x, x))
}
(2) Now, write a function that will produce a sample of random variable that is distributed as follows:
\[ f(x) = 1 - x, 0 \leq x \leq 1 \] \[ f(x) = x - 1, 1 < x \leq 2 \]
fx2 <- function(x) {
if (x >= 0 && x <= 2) return (ifelse(x > 1, x - 1, 1 - x))
}
(3) Draw 1000 samples (call your function 1000 times each) from each of the above two distributions and plot the resulting histograms. You should have one histogram for each PDF. See that it matches your understanding of these PDFs
# first generate a sequence of values between 0 and 2
n <- seq(from = 0, to = 2, length.out = 10000)
# apply function1 to the seqence and capture the resulting probabilities
prob1 <- sapply(n, fx1)
# lets look at the some of the probability values calculated by the first PDF
head(prob1, 10)
## [1] 0.00000000 0.00020002 0.00040004 0.00060006 0.00080008 0.00100010
## [7] 0.00120012 0.00140014 0.00160016 0.00180018
# apply function1 to the seqence and capture the resulting probabilities
prob2 <- sapply(n, fx2)
# lets look at the some of the probability values calculated by the second PDF
head(prob2, 10)
## [1] 1.0000000 0.9998000 0.9996000 0.9993999 0.9991999 0.9989999 0.9987999
## [8] 0.9985999 0.9983998 0.9981998
Note. If we plot the PDF probabilities returned from the PDF function in a histogram, we can see that the resulting plot is not showing an expected distribution.
hist(prob1)
Consequently, we need to sample from the function using the calculated probabilities:
pdf1 <- sample(n, 1000, replace=TRUE, prob=prob1)
hist(pdf1)
pdf2 <- sample(n, 1000, replace=TRUE, prob=prob2)
hist(pdf2)
(4) Now, write a program that will take a sample set size n as a parameter and the PDF as the second parameter, and perform 1000 iterations where it samples from the PDF, each time taking n samples and computes the mean of these n samples. It then plots a histogram of these 1000 means that it computes.
PDF_plot <- function(n, PDF, normal_curve = FALSE) {
# create a vector of 1000 values between 0 and 2
vec <- seq(from = 0, to = 2, length.out = 1000)
# Calculate the probabilities using the given PDF. This is done by applying the supplied
# PDF to the vector of 1000 values between 0 and 2
probs <- sapply(vec, PDF)
# perform 1000 iterations, taking n number of samples
sampled_means <- sapply(1:1000, function(x) mean(sample(vec, n, replace=TRUE, prob = probs)))
# generate a histogram of the sampled means
h <- hist(sampled_means, main='Distribution of Sampled Means', xlab='Sampled Mean')
# include an option to add a normal curve to the histogram
if (normal_curve) {
#http://www.statmethods.net/graphs/density.html
x <- sampled_means
xfit<- seq(min(x), max(x),length=40)
yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)
}
return (mean(sampled_means))
}
Call the PDF_plot
function using sample size of 10 and the first PDF function fx1
.
m1 <- PDF_plot(10, PDF=fx1)
Call the PDF_plot
function using sample size of 10 and the second PDF function fx2
:
m2 <- PDF_plot(10, PDF=fx2)
(5) Verify that as you set n to something like 10 or 20, each of the two PDFs produce normally distributed mean of samples, empirically verifying the Central Limit Theorem. Please play around with various values of n and you’ll see that even for reasonably small sample sizes such as 10, Central Limit Theorem holds.
We’ll adjust the value of n
from 10 to 20 to show the two PDFs produce normally distributed mean of samples. Additionally, to verify the distributions are consistent with the Central Limit Theorem, we’ll overlay a normal curve on top of the histogram.
pdf1_10 <- PDF_plot(10, PDF=fx1, normal_curve = TRUE)
pdf1_20 <- PDF_plot(20, PDF=fx1, normal_curve =TRUE)
pdf2_10 <- PDF_plot(10, PDF=fx2, normal_curve =TRUE)
pdf2_10 <- PDF_plot(20, PDF=fx2, normal_curve =TRUE)