library(dplyr)
data("storms")
# documentation about variables ?storms
Simulating Sampling
Simulating Sampling
In this activity, we will use simulation on a pre-built dataset to explore The Central Limit Theorem
Load the data, “storms”
Name the variable for wind
Since we are just exploring this single quantitative varible, we will bypass needing to require storms$
<- storms$wind
wind mean(wind)
[1] 53.63774
sd(wind)
[1] 26.18907
View a histogram of the population data
hist(wind, main = "Maximum sustained Wind Speed (in knots)")
# What is the shape of the population distribution of wind?
Create a sample of the data using sample
We will sample with replacement
sample(wind, size = 100, replace = TRUE)
[1] 35 35 45 60 90 90 85 110 55 75 25 25 155 40 55 70 60 80
[19] 65 40 15 145 70 20 35 35 65 65 70 120 100 55 45 20 70 110
[37] 35 50 145 45 25 40 40 35 55 60 65 30 45 25 45 135 65 60
[55] 45 30 75 55 35 45 70 30 60 55 80 75 25 60 30 90 70 90
[73] 30 55 30 25 40 50 65 60 45 55 25 40 75 25 40 30 35 30
[91] 70 70 75 50 30 25 30 45 20 25
mean(sample(wind, size = 100, replace = TRUE))
[1] 51.25
sd(sample(wind, size = 100, replace = TRUE))
[1] 27.64384
View the histogram of the sample distribution of wind
hist(sample(wind, size = 100, replace = TRUE), main = "Sample of Max Wind Speed (in knots)")
# What is the shape of the sample distribution of wind?
# What does this histogram show that is different than the population plot?
From the sample, create a SAMPLING distribution
<- replicate(1000, mean(sample(wind, size=100, replace=TRUE)))
r mean(r)
[1] 53.64045
sd(r)
[1] 2.696383
View the histogram of the sampling distribution
hist(r, main = "Sampling Distribution of Wind Speed (in knots)")
# What is the shape of the sampling distribution of wind?
# What does this histogram show that is different than the sample plot?