\[ f(x) = \left\{ \begin{array}{1 1} x \quad 0 \leq x < 1 \\ 2-x \quad 1 \leq x \leq 2 \end {array} \right. \]
This function can be simply defined as below, which can be later used in R’s built in ‘sample’ method to generate samples
fun1 <- function(x){
if(0 <= x && x <= 2){
if(x <= 1){
return(x)
} else {
return(2-x)
}
}
}
However, let’s use the Inverse Sampling Technique here, to generate samples :
Create an inverse cdf function for this distribution:
Anti-derivative for the above function is:
\[ F(x) = \left\{ \begin{array}{1 1} \frac{x^2}{2} \quad 0 \leq x < 1 \\ 2x - \frac{x^2}{2} \quad 1 \leq x \leq 2 \end{array} \right. \]
The inverse of the anti-derivative: \[ F^{-1}(x) = \left\{ \begin{array}{1 1} \sqrt{2y} \quad 0 \leq y < \frac{1}{2} \\ 2 - \sqrt{2(1 - y)} \quad \frac{1}{2} \leq y \leq 1 \end{array} \right. \]
The Inverse CDF function:
inverseCDF <- function(y){
resp <- 0
if (0 <= y && y <= 1) {
if ( y <= 0.5) {
resp = (sqrt(2*y))
} else {
resp = (2 - sqrt(2 * (1-y)) )
}
}
else {
resp = 0 # Out of range !
}
return (resp)
}
Lets apply the above inverse cdf function to a vector of randomly selected values between 0 and 1, and plot those data points.
require(ggplot2)
## Loading required package: ggplot2
sample.data <- data.frame(X = sapply(runif(10000), inverseCDF))
print(head(sample.data))
## X
## 1 0.2141388
## 2 1.2601801
## 3 0.9503347
## 4 1.6469374
## 5 1.4754035
## 6 1.6479129
ggplot(data = sample.data, aes(sample.data$X)) + geom_histogram(binwidth=.05) + ggtitle("Triangular PDF")
\[ f(x) = \left\{ \begin{array}{1 1} 1-x \quad 0 \leq x \leq 1 \\ x-1 \quad 1 < x \leq 2 \end {array} \right. \]
fun2 <- function(x) {
resp <- 0
if (0 <= x && x <= 2) {
if (x <= 1) {
resp <- (1 - x)
}
else {
resp <- (x - 1)
}
}
return (resp)
}
First create a vector of probabilities using the above function.And use that as ‘prob’ for R-built in function ‘sample’, and plot.
#simply, create vector of values between 0 and 2
x <- seq(from = 0, to = 2, length.out = 10000)
#create vector of probabilities, which can be used later for sampling.
funx.prob <- sapply(x, fun2)
#Now, create a sample
sample.df <- data.frame(X = sample(x, 10000, replace=TRUE, prob=funx.prob))
ggplot(sample.df, aes(sample.df$X)) + geom_histogram(binwidth=.05) + ggtitle("Inverted Traingular PDF")
distributionOfMeans <- function(n, pdf) {
#Calculates the mean of 1000 samples of size n for the given pdf
#plots the distribution of the mean.
x <- seq(0, 2, by=0.01)
funx.prob <- sapply(x, pdf)
means <- c()
for(i in 1:1000) {
sampleiter <- sample(x, n, replace=TRUE, prob=funx.prob)
means <- c(means, mean(sampleiter))
}
print(paste(("The mean of the samplings is: "), mean(means)))
print(paste(("The standard deviation of the samplings is: "), sd(means)))
means.df <- data.frame(means)
ggplot(means.df, aes(means)) + geom_histogram(binwidth=.025) + xlab("Mean of the Sample") + ggtitle(paste("Distribution of Means;PDF=", deparse(substitute(pdf)), ";Samples=1000;Sample Size=", n))
}
Client Calls:
Lets try with the sample size of 20
distributionOfMeans(n=20, pdf=fun1)
## [1] "The mean of the samplings is: 1.0051895"
## [1] "The standard deviation of the samplings is: 0.0923888252700768"
distributionOfMeans(n=20, pdf=fun2)
## [1] "The mean of the samplings is: 1.002066"
## [1] "The standard deviation of the samplings is: 0.159947310969226"
Lets try with the sample size of 10
distributionOfMeans(n=10, pdf=fun1)
## [1] "The mean of the samplings is: 1.0021"
## [1] "The standard deviation of the samplings is: 0.130196771078238"
distributionOfMeans(n=10, pdf=fun2)
## [1] "The mean of the samplings is: 0.994664"
## [1] "The standard deviation of the samplings is: 0.222111522962301"
From the above, its evident that even for reasonably small sample sizes such as 10, Central Limit Theorem holds true - The means of the sampling distributions from ‘any’ probability distribution, forms a bell shaped curve (‘normally distributed’ around the mean of the original distribution)