W4 Discussion - Central Limit Theorem

##Make this example reproducible
set.seed(0)

##Set variables of gamma distribution
k <- 3
theta <- 5

##Generate 1000 random values that follow gamma distribution
x <- rgamma(n = 1000, k, theta)

##Calculate mean and standard deviate of dataset x
mean(x)

## [1] 0.5983854

sd(x)

## [1] 0.3559629

##Create histogram to show distribution of values
hist(x, main = "Gamma Distribution", xlab = "x", ylab = "Frequency")

#create empty vector to hold sample means
sample5 <- c()

#take 1,000 random samples of size n=5
n = 1000
for (i in 1:n){
sample5[i] = mean(sample(x, 5, replace=TRUE))
}

#calculate mean and standard deviation of sample means
mean(sample5)

## [1] 0.5982778

sd(sample5)

## [1] 0.1571055

hist(sample5, main = "Sample Size of 5", xlab = "x", ylab = "Frequency" )

##create empty vector to hold sample means
sample30 <- c()

#take 1,000 random samples of size n=30
n = 1000
for (i in 1:n){
sample30[i] = mean(sample(x, 30, replace=TRUE))
}

#calculate mean and standard deviation of sample means
mean(sample30)

## [1] 0.5951367

sd(sample30)

## [1] 0.06579251

hist(sample30, main = "Sample Size of 30", xlab = "x", ylab = "Frequency" )

##create data matrix 3x2
CLTdata <- c(mean(x), sd(x), mean(sample5), sd(sample5), mean(sample30), sd(sample30))

CLT <- matrix(CLTdata, nrow = 2, ncol = 3)

##set row and column names
statfun <- c("Mean", "Standard Deviation")
sample <- c("Population", "Sample n = 5", "Sample n = 30")

rownames(CLT) <- statfun
colnames(CLT) <- sample

CLT

##                    Population Sample n = 5 Sample n = 30
## Mean                0.5983854    0.5982778    0.59513670
## Standard Deviation  0.3559629    0.1571055    0.06579251

#create empty vector to hold sample medians
smed5 <- c()

#take 1,000 random samples of size n=5
n = 1000
for (i in 1:n){
smed5[i] = median(sample(x, 5, replace=TRUE))
}

#calculate sample median and standard deviation
median(smed5)

## [1] 0.5268169

sd(smed5)

## [1] 0.1891658

hist(smed5, main = "Sample Size of 5", xlab = "x", ylab = "Frequency" )

##create empty vector to hold sample median
smed30 <- c()

#take 1,000 random samples of size n=30
n = 1000
for (i in 1:n){
smed30[i] = median(sample(x, 30, replace=TRUE))
}

#calculate sample median and standard deviation
median(smed30)

## [1] 0.5155635

sd(smed30)

## [1] 0.07306791

hist(smed30, main = "Sample Size of 30", xlab = "x", ylab = "Frequency" )

##create data matrix 3x2
CLTmeddata <- c(median(x), sd(x), median(smed5), sd(smed5), median(smed30), sd(smed30))

CLTmed <- matrix(CLTmeddata, nrow = 2, ncol = 3)

##set row and column names
statfunmed <- c("Median", "Standard Deviation")
samplemed <- c("Population", "Sample n = 5", "Sample n = 30")

rownames(CLTmed) <- statfunmed
colnames(CLTmed) <- samplemed

CLTmed

##                    Population Sample n = 5 Sample n = 30
## Median              0.5207141    0.5268169    0.51556348
## Standard Deviation  0.3559629    0.1891658    0.07306791

https://www.statology.org/gamma-distribution-in-r/ https://www.statology.org/central-limit-theorem-in-r/

When evaluating the sample median, the central limit theorem appears to hold true for a number of reasons. The first reason for this is because the sample size increased which takes into account a greater number of possible variables in the dataset which centralizes the data around closer to the population median. Also, because the sample size increased, the median value of each sample created a more normal distribution curve because there were more smples that yielded a median similar to the population median meaning that that the more medians were in the center with higher and lower medians being less moving away from the population median. This presents as a more normal distribution with a taller bell curve. Lastly, as sampling size increased with regard to the median, standard deviation moved closer to zero shown in the median data matrix. This shows that statistical confidence grew by producing less statistical variance in the dataset.

W4 Discussion - Central Limit Theorem

Justin Nevins

2024-02-09