Learning objectives

  1. Define and idenitify the various types of sampling strategies
  2. Given a quantity to estimate, choose a sampling strategy will select a sample most representative of the population of interest

Key vocabulary

  1. A simple random sample is a sample where the following two properties hold:
  1. A systematic random sample can be used to choose a sample when there is a convenient list of the individuals in a population, and the order of that list is not associated with the ordering of the list. See textbook for further explanation.

  2. A stratified sample is a sample where observations in the population fall into groups that are associated with the quantity we would like to estimate. In this case observations are broken down into their respective groups or strata and then the desired number of observations for the sample are chosen via simple random sampling from each strata. Stratefied sampling should be used when observations within strata are very similar, but the observations across strata are very different.

  3. Cluster sampling is very similar to simple random sampling except that instead of randomly selecting observations we randomly select groups of observations or clusters. We use cluster sampling when we have limited resources to choose a sample and the differences in observations within a cluster are large, but the actual samples themselves look very similar.

  4. Multi-stage sampling is very similar to cluster sampling, except that after choosing a cluster or clusters, we then use simple random sampling to choose a certain number of observations.

Case studies

Case study 1: HIV and rural Kenyan villages

Your team is a group of epidemiologists (disease scientists) who are interested in estimating the percentage of people in rural eastern Kenyan villages who are HIV positive. There are several dozen of these villages throughout Kenya, and it is not feasible for you to test each person in each village for HIV due to a lack of resources, so you decide to construct a sample that closely resembles this population and then calculate the percent of HIV positive people within the sample as an estimate of the percentage of villagers who are HIV positive.

Your job is to devise a sampling strategy that will create a representative sample (a sample which accurately represents the population).

As you research these villages, you find out several facts that may be of assistance in constructing an appropriate sampling strategy:

  • The villages cover a wide geographic area but are relatively similar in terms of religion practiced, food eaten, and government structures
  • You have a limited amount of funding to send medical staff to these villagers, so your sampling strategy must be cost effective
  • The average size of a village is about 40 people
  • You want a sample size of at least 100 so that your estimate will be valid

Guiding questions

  1. What pieces of information given above are important in selecting a sample most representative of the population we would like to survey?
  2. Which of the sampling strategies discussed in class is most likely choose a representative sample given the information presented?
  3. How can you apply the sampling strategy you decided on in (2) to fit the context of the case study?

Case study 2: Drug arrests across cities

Your team is a group of journalists who are interested in estimating the percent of arrests that involve possesion of drugs. You are mostly interested in estimating this percentage within major cities, so you decide to focus on New York, Chicago, Houston, Los Angeles, and Seattle. Since you work for a newspaper, your resources are limited and you cannot go through every police record in each of these cities, so you decide to create a sample that is representative of the population and then calculate the percentage of arrests due to drugs in your sample as your estimate of the population percentage.

Your job is to devise a sampling strategy that will create a representative sample (a sample which accurately represents the population).

As you research this project you come across several facts that may be of assistance to you as you tackle this problem:

  • The percentage of arrests due to drugs most likely is not the same across cities
  • In your news story, it is important that each city is represented according to its population.
  • You have a limited budget, so you plan to look at 2000 police records total across all the cities

Guiding questions

  1. What pieces of information given above are important in selecting a sample most representative of the population we would like to survey?
  2. Which of the sampling strategies discussed in class is most likely choose a representative sample given the information presented?
  3. How can you apply the sampling strategy you decided on in (2) to fit the context of the case study?

Case study 3: Facebook advertisements

Your team is a group of Facebook computer engineers who are designing a new advertisement interface for Facebook. Rather than immediately implementing the advertising interface for all 1.2 billion Facebook users, you decide you will choose a random sample of 30,000 users from each country in the word that uses Facebook to be included in the new advertising interface.

Your job is to devise a sampling strategy that will create a representative sample (a sample which accurately represents the population).

As you research how to best construct this sample you came across several pieces of information that may be useful to consider:

  1. Facebook users are extremely diverse so it is important that your sample represents this diversity.
  2. Since this new advertising platform will be used to sell items to Facebook users it is important that the sample accurately represents the entire population.
  3. As an engineer, you have access to all 1.2 billion Facebook users accounts.

Guiding questions

  1. What pieces of information given above are important in selecting a sample most representative of the population we would like to survey?
  2. Which of the sampling strategies discussed in class is most likely choose a representative sample given the information presented?
  3. How can you apply the sampling strategy you decided on in (2) to fit the context of the case study?

Case study 4: High school suspensions in Denver

Your team is a group of social scientists who would like to estimate the percent of high school students who are suspended at least once during the 2014-2015 schoo year. Rather than survey every student from every high school in Denver you decide you will take a random sample of students and then calculate the percentage of students in your sample who have been suspended at least once as your estimate of the population percentage.

Your job is to devise a sampling strategy that will create a representative sample (a sample which accurately represents the population).

As you are researching how to best do this you come across several pieces of information that may be useful to you:

  • Student suspension rates vary greatly depending upon the school
  • For your research to be valid it is important that students from each school in Denver be sampled

Guiding questions

  1. What pieces of information given above are important in selecting a sample most representative of the population we would like to survey?
  2. Which of the sampling strategies discussed in class is most likely choose a representative sample given the information presented?
  3. How can you apply the sampling strategy you decided on in (2) to fit the context of the case study?

Case study 5: Music and Denver neighborhoods

Your group is a team of Spotify music consultants who have been tasked with surveying music tastes throughout Denver in order to maximize advertising revenue. Specifically, you would like to know the percentage of users on Spotify that use the pre-made playlists Spotify comes up with. Since you cannot survey every Spotify user in Denver, you decide to create a sample of 30,000 people who use Spotify and then calculate the percentage of your sample who use pre-made playlists as your estimate of the population percentage.

Your job is to devise a sampling strategy that will create a representative sample (a sample which accurately represents the population).

As you begin thinking about how to best construct a representative sample you come across some information that may be relevent:

  • Within different neighborhoods music tastes of households are very similar
  • Between different neighborhoods music tastes vary greatly

Guiding questions

  1. What pieces of information given above are important in selecting a sample most representative of the population we would like to survey?
  2. Which of the sampling strategies discussed in class is most likely choose a representative sample given the information presented?
  3. How can you apply the sampling strategy you decided on in (2) to fit the context of the case study?

Case study 6: Hospital bacterial infections

Your group is a team of doctors who are concerned about rising infection rates within your network of 10 hospitals. In order to determine if unclean facilities are to blame, you decide to test the hospitals for the amount of bacteria present in the surgery rooms. Because the situation is urgent and time is of the essence, you decide it would be most effective to gather data by creating a random sample of surgery rooms to test for bacteria.

Your job is to devise a sampling strategy that will create a representative sample (a sample which accurately represents the population).

As you begin to think about how to best create a representative sample you discover some critical information which might be useful:

  • The 10 hospitals are very similar in the way they clean the facilities as well as their patient populations
  • Each hospital has 15 surgery rooms which are more or less identical to the surgery rooms at the other hospitals
  • You only have the time and resources to test some of the hospitals

Guiding questions

  1. What pieces of information given above are important in selecting a sample most representative of the population we would like to survey?
  2. Which of the sampling strategies discussed in class is most likely choose a representative sample given the information presented?
  3. How can you apply the sampling strategy you decided on in (2) to fit the context of the case study?