The Law of Large Numbers refers to the phenomenon in random experiments where, as the number of trials increases, the frequency of an event approaches a stable value. In other words, when the number of trials is sufficiently large, the probability of an event occurring is almost equal to its frequency.
The Central Limit Theorem (CLT) states that regardless of the distribution of random variables, when these variables are added together, the distribution of their sum tends to become a normal distribution. In other words, even if the individual distributions of each random variable are different, as long as there are enough of them, their sum exhibits characteristics of a normal distribution.
I believe the commonalities between the two are, firstly, their shared focus on random phenomena, and secondly, their requirement for a sufficiently large sample size. The differences between them are that the Law of Large Numbers does not require random events to have the same distribution, whereas the Central Limit Theorem requires that random variables be independent and identically distributed. The Law of Large Numbers describes a single random event, while the Central Limit Theorem describes the properties of the sum of multiple random events.
In RStudio, the binomial distribution is a probability distribution that describes the probability distribution of the number of successes in a series of independent and identically distributed Bernoulli trials. Each trial has only two possible outcomes: success (typically represented as 1) or failure (typically represented as 0), and the probability of success remains constant from trial to trial.
rm(list = ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 517433 27.7 1148188 61.4 660382 35.3
## Vcells 939519 7.2 8388608 64.0 1769617 13.6
cat("\f")
set.seed(40)
mydata <- rbinom(20, size = 20, prob = 0.4)
head(mydata)
## [1] 9 11 9 5 6 8
library("psych")
describe(mydata)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 20 7.7 2.3 7.5 7.5 2.22 5 12 7 0.47 -1.05 0.51
hist(x = mydata,
main = "Histogram of the Binomial Distribution ",
xlab = "")
mu <- mean(mydata)
mu
## [1] 7.7
sigma <- sd(mydata)
sigma
## [1] 2.29645
?matrix
## 打开httpd帮助服务器… 好了
?rep
z <- matrix(data = rep(x = 0,
times = 10000
),
nrow = 10000,
ncol = 1)
z[1:16]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
describe(z)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10000 0 0 0 0 0 0 0 0 NaN NaN 0
for (i in 1:10000){
z[i,] <- mean(sample( x = mydata,
size = 100,
replace = TRUE
)
)
}
z[1:16]
## [1] 7.49 7.58 7.65 8.16 7.63 7.72 8.18 7.56 7.78 7.76 7.77 7.68 7.62 8.00 7.10
## [16] 7.64
describe(z)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10000 7.7 0.22 7.7 7.7 0.22 6.8 8.54 1.74 0.07 -0.01 0
hist(z, xlab = "", main = "Histogram of Sample Mean ")
percent25 <- quantile(mydata, .25)
print(percent25)
## 25%
## 6
percent25_matrix <- matrix(data = rep(x = 0,
times = 10000
),
nrow = 10000,
ncol = 1)
describe(percent25_matrix)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10000 0 0 0 0 0 0 0 0 NaN NaN 0
for (i in 1:10000){
percent25_matrix[i,] <- mean(sample(x = mydata,
size = 100,
replace = TRUE), .25)}
describe(percent25_matrix)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10000 7.4 0.28 7.4 7.39 0.27 6.34 8.54 2.2 0.1 0.15 0
hist(percent25_matrix,
xlab = "",
ylab = "Histogram of the Sample Means "
)
The 25th percentile for this population is 6
The 25th percentile calculated with CLT is 7.4