Simulating Sampling

Author

Rachel Saidi

Simulating Sampling

In this activity, we will use simulation on a pre-built dataset to explore The Central Limit Theorem

Load the data, “storms”

library(dplyr)
data("storms")
?storms  # documentation about variables

Name the variable for wind

Since we are just exploring this single quantitative varible, we will bypass needing to require storms$

wind <- storms$wind
mean(wind)
[1] 53.63774
sd(wind)
[1] 26.18907

View a histogram of the population data

hist(wind, main = "Maximum sustained Wind Speed (in knots)")

# What is the shape of the population distribution of wind?

Create a sample of the data using sample

We will sample with replacement

sample(wind, size = 100, replace = TRUE)
  [1]  35  35  45  60  90  90  85 110  55  75  25  25 155  40  55  70  60  80
 [19]  65  40  15 145  70  20  35  35  65  65  70 120 100  55  45  20  70 110
 [37]  35  50 145  45  25  40  40  35  55  60  65  30  45  25  45 135  65  60
 [55]  45  30  75  55  35  45  70  30  60  55  80  75  25  60  30  90  70  90
 [73]  30  55  30  25  40  50  65  60  45  55  25  40  75  25  40  30  35  30
 [91]  70  70  75  50  30  25  30  45  20  25
mean(sample(wind, size = 100, replace = TRUE))
[1] 51.25
sd(sample(wind, size = 100, replace = TRUE))
[1] 27.64384

View the histogram of the sample distribution of wind

hist(sample(wind, size = 100, replace = TRUE), main = "Sample of Max Wind Speed (in knots)")

# What is the shape of the sample distribution of wind?

# What does this histogram show that is different than the population plot?

From the sample, create a SAMPLING distribution

r <- replicate(1000, mean(sample(wind, size=100, replace=TRUE)))
mean(r)
[1] 53.64045
sd(r)
[1] 2.696383

View the histogram of the sampling distribution

hist(r, main = "Sampling Distribution of Wind Speed (in knots)")

# What is the shape of the sampling distribution of wind? 

# What does this histogram show that is different than the sample plot?