Now, we finally perform the multistage sampling process.
# Stage 1: sample counties
sampled_counties <- sample(unique(pop$county), c_counties)
# Stage 2: sample districts within counties
sampled_districts <- pop %>%
filter(county %in% sampled_counties) %>%
distinct(county, district_id) %>%
group_by(county) %>%
slice_sample(n = c_districts) %>%
ungroup()
# Stage 3: sample schools within districts
sampled_schools <- pop %>%
semi_join(sampled_districts, by = c("county", "district_id")) %>%
distinct(district_id, school_id) %>%
group_by(district_id) %>%
slice_sample(n = c_schools) %>%
ungroup()
# Stage 4: sample students within schools
sample_multistage <- pop %>%
semi_join(sampled_schools, by = c("district_id", "school_id")) %>%
group_by(school_id) %>%
slice_sample(n = c_students) %>%
ungroup()
Lastly, we calculate the average score from the sample and compare to the true population average.
sample_mean <- mean(sample_multistage$score)
results <- tibble(
True_Population_Mean = round(true_mean, 2),
Multistage_Sample_Mean = round(sample_mean, 2),
Sample_Size = nrow(sample_multistage)
)
knitr::kable(results)
Our sample produces a mean close to the true mean, demonstrating how multistage sampling can approximate population characteristics without measuring every individual.