This week, we’ll empirically verify Central Limit Theorem. We’ll write code to run a small simulation on some distributions and verify that the results match what we expect from Central Limit Theorem. Please use R markdown to capture all your experiments and code. Please submit your Rmd file with your name as the filename. (1) First write a function that will produce a sample of random variable that is distributed as follows:
f(x) = x, 0 \(\leq\) x \(\leq\) 1 f(x) = 2 - x, 1 < x \(\leq\) 2
That is, when your function is called, it will return a random variable between 0 and 2 that is distributed according to the above PDF. Please note that this is not the same as writing a function and sampling uniformly from it. In the online session this week, I’ll cover Sampling techniques. You will find it useful when you do the assignment for this week. In addition, as usual, there are one-liners in R that will give you samples from a function. We’ll cover both of these approaches in the online session.
y1 <- function(x) {
if(x >= 0 & x <= 1) x
else if(x > 1 & x <= 2) 2 - x
}
f(x) = 1 - x, 0 \(\leq\) x \(\leq\) 1 (1) f(x) = x - 1, 1 < x \(\leq\) 2 (2)
y2 <- function(x) {
if(x >= 0 & x <= 1) 1 - x
else if(x > 1 & x <= 2) x - 1
}
library(ggplot2)
# call 1000 samples for the first function
set.seed(123)
x1 <- runif(1000, 0, 2)
df1 <- as.data.frame(x1)
df1$y1 <- sapply(x1, y1)
ggplot(df1, aes(y1)) + geom_histogram() +
ggtitle("the First PDF")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# call 1000 samples for the second function
set.seed(123)
x2 <- runif(1000, 0, 2)
df2 <- as.data.frame(x2)
df2$y2 <- sapply(x2, y2)
ggplot(df2, aes(y2)) + geom_histogram() +
ggtitle("the Second PDF")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# apply 1000 times mean of sample 10 with the first PDF
df3 <- sapply(1:1000, function(d, n) mean(sample(df1$y1, 10)))
df3 <- as.data.frame(df3)
ggplot() + geom_histogram(aes(df3)) +
ggtitle("Mean Distributions of Sample Size 10 out of the First PDF")
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# apply 1000 times mean of sample 10 with the second PDF
df4 <- sapply(1:1000, function(t, n) mean(sample(df2$y2, 10)))
df4 <- as.data.frame(df4)
ggplot() + geom_histogram(aes(df4)) +
ggtitle("Mean Distributions of Sample Size 10 out of the Second PDF")
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# apply 1000 times mean of sample 20 with the first PDF
df5 <- sapply(1:1000, function(d, n) mean(sample(df1$y1, 20)))
df5 <- as.data.frame(df5)
ggplot() + geom_histogram(aes(df5)) +
ggtitle("Mean Distributions of Sample Size 20 out of the First PDF")
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# apply 1000 times mean of sample 20 with the second PDF
df6 <- sapply(1:1000, function(t, n) mean(sample(df2$y2, 20)))
df6 <- as.data.frame(df6)
ggplot() + geom_histogram(aes(df6)) +
ggtitle("Mean Distributions of Sample Size 20 out of the Second PDF")
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# apply 1000 times mean of sample 50 with the first PDF
df7 <- sapply(1:1000, function(d, n) mean(sample(df1$y1, 50)))
df7 <- as.data.frame(df7)
ggplot() + geom_histogram(aes(df7)) +
ggtitle("Mean Distributions of Sample Size 50 out of the First PDF")
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# apply 1000 times mean of sample 50 with the second PDF
df8 <- sapply(1:1000, function(t, n) mean(sample(df2$y2, 50)))
df8 <- as.data.frame(df8)
ggplot() + geom_histogram(aes(df8)) +
ggtitle("Mean Distributions of Sample Size 50 out of the Second PDF")
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.