Module 4 Discussion Post

Please Google and describe Law of Large Numbers in your own words.

Law of Large Numbers: The law of large numbers states that if you repeat the same experiment a large amount of times, the average of those results should be relatively close to the expected value. The result becomes closer to the expected result as the number of trials continues to grow and grow. The more the number of trials grows, the more outlier results become less impactful to the average. For example, let’s say you flip a coin 3 times and it lands on heads each time. You know that the probability of landing on heads is a lot closer to 50% than it is 100%. If you flip a coin 200 times, you will land on heads much closer to 50% than 100% by the end of the 200 trials.

Please explain CLT in your own words. You can, and should read your textbook and/or online references to understand what is CLT, its uses, et ctera. Furthermore, if you find any useful resource, include it in your post so that the rest of the class can have a look at it to.

The Central Limit Theorem (CLT) states that when the observations of an experiment are independent and the sample size is “sufficiently large”, the sample proportion (p-hat) will resemble a normal distribution curve. To ensure that the sample size is “sufficiently large”, np must be greater than or equal to 10 and n(1-p) must be greater than or equal to 10. The normal distribution will be reflected in the CLT regardless whether the entire population is skewed or normal.

https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS704_Probability12.html

What are the similarities and differences between LLN and CLT? Write a few lines.

The LLN and CLT both require large sample sizes or a large number of trials to prove the theories hold true. In both theories, as the sample sizes grow larger, the results will reflect a normal distribution and the non-normal or outlier results will slowly hold less weight in terms of affecting the average. The LLN and CLT both provide us with more insight in how the sample mean behaves, and how it becomes closer to the population mean as the number of samples becomes larger.

Pick up any distribution apart from normal, uniform or poisson. You can Wikipedia about the distribution and/or read how to implement the distribution in R (what parameters are required to generate the distribution). Please describe this distribution first in 5 lines.

Exponential Distribution: The exponential distribution is a continuous probability that relates to the amount of time it takes for a specific event to occur. The exponential random variable has fewer larger values, and more small values. The exponential distribution is also considered “memoryless” because the time for the next event to occur does not depend on how much time elapsed since the previous event. The parameter needed to generate the exponential distribution is lambda (λ).

Apply the CLT on the sample mean of this chosen distribution in R. Does the central limit theorem hold as expected?

Example: You are watching a busy intersection near your apartment and see how frequent blue cars drive by. You observe for an hour and determine the average number of blue cars to drive through the intersection is 7 blue cars every 5 minutes.

set.seed(24)
?rexp
library(conflicted) 
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2

blue <- 7
time <- 5
lambda <- blue/time
n <- 30
nosim <- 2000

num_bluecars <- rexp(n*nosim,lambda)
qplot(num_bluecars, col = I("Black"), boundary = 0)

## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Mean of num_bluecars 
print(mean(num_bluecars))

## [1] 0.7173866

car_matrix = matrix(rep(0,75*1000),75,1000)
for (i in 1:1000){car_matrix[1:75,i]=sample(num_bluecars,75,replace = TRUE)}

my_means = colMeans(car_matrix)

## Mean of my_means
print(mean(my_means))

## [1] 0.7141593

hist(my_means)

plot(density(my_means))

The Central Limit Theorem (CLT) relatively holds true because you can see the normal distribution curve being created in both the histogram and line graph. Initially, I did not apply enough samples (30*1000) to have the normal distribution curve be present. By increasing the number of samples, the normal distribution curve became more present and is displayed in both the histogram and line graph. Also - the mean of num_bluecars (0.7174) is close to the mean of my_means (0.7142), which tells me the CLT is close to the sample mean and working quite effectively.

Module 4 Discussion Post

Chris Toomey

2024-02-08