Example
Women in love: a cultural revolution in progress, by Shere Hite (1987)
Sample Surveys
Characteristics of a target population are of interest
Impractical or impossible to observe the whole population
Select a sample of units in the population
Use data from sampled units to estimate characteristics of the entire population
Hite’s survey design
- Sample
- The sample was self-selected. Mailed questionnaires to 100K \(\rightarrow\) 4.5% returned (i.e. low response rate)
- Addresses from broad range of special groups \(\rightarrow\) excludes many women in population
- Questionnaire
- 127 essay questions \(\rightarrow\) high respondent burden, nonresponse bias (who completes?)
- Question wording vague (“in love” has many different interpretations)
- Leading questions
Example - Continued
- What is a good sample? - “Representativeness”"
A good sample should reproduce the characteristics of interest in the population, as closely as possible.
- What else? - accurate measurement
We should get answers as accurately as possible
Survey Sampling
Psychology, Cognitive Science |
Statistics |
Studies Nonsampling error |
Studies Sampling error |
Questionnaire design |
Sampling design, estimation |
Sir Francis Galton (1822-1911)

Galton was a polymath who made important contributions in many fields of science, including meteorology (the anti-cyclone and the first popular weather maps), statistics (regression and correlation), psychology (synesthesia), biology (the nature and mechanism of heredity), and criminology (fingerprints)
He first introduced the use of questionnaires and surveys for collecting data on human communities.
Jerzy Neyman (1894-1981)

Showed that random sampling can provide a representative sample.
Neyman, J.(1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection, Journal of the Royal Statistical Society: Series B, 97 (4), 557–625
Morris Hansen (1910-1990)

- Developed sampling designs for official statistics in Census Bureau and Bureau of Labor Statistics
- A pioneer in making Survey Sampling a gold standard for data collection in government agencies.
- Olkin, I. (1987). A conversation with Morris Hansen. Statistical Science 2, 162-179
Why sampling?
- To reduce cost (efficiency)
- To obtain information faster (timeliness)
- Sometimes, sampling is the only way to obtain the information about the population. (e.g. quality inspection in automobile company)
Introduction
- Surveys vary greatly along several dimensions
- Scope of study objectives
- Scale of the target population
- Data collection methods
- Illustrate variety in surveys with a few examples
- Others focus on a relatively fewer number of parameters
- Crop Yield Survey: Estimate crop yield through laboratory analysis of plant measurements
2. Target population
- The extent of the target population needs to be defined precisely
- Some surveys study relatively broad domains
- Others are more narrow
- National Agricultural Workers Survey restricts to the agricultural sector of the labor force .
3. Data Collection
- Interviewer-mediated: face-to-face survey, telephone survey
- Self-administered: Paper, Web-based survey
- Field observation: Data observed at sample sites (e.g. ground, tree, water, air)
- A single survey may use multiple data collection modes
- For example, National Health and Nutrition Examination Survey uses face-to-face interview and physical examination.
Survey Process
Multiple components: Planning & design, sample preparation, data collection, data analysis
Collaboration between the statisticians and the investigators is crucial throughout the survey process
Components can occur in parallel, and the process is often iterative
Survey Process: Define precise objectives
- Parameter of interest
- What estimates are of interest?
- What characteristics of the target population do we want to measure?
- Translate general concepts into specific, measurable quantity
- Target population
- Precisely, what population do we want to study?
- Eligibility criteria: Age groups, Geographic scope
- Reference time points
Example
- An investigator wants to study “attitudes of Iowa State students about doing volunteer work.”
- Measurable and quantifiable concepts
- Proportion of students who are “very likely”" to volunteer in 2020
- Total hours of volunteer work completed in 2019
- Target population: Iowa State students
- Include graduate students?
- Include part-time students?
Survey Process: Develop Data Collection Protocols
- Questions or measurement techniques that capture the quantities of interest
- Construct questionnaire or other data collection forms (Ex: “How many hours of volunteer work did you complete in 2019?”“)
- Pre-test and revise the questionnaire/form (Ex: Expand the question to specify different types of volunteer work - define volunteer work precisely)
- Train interviewers, data collectors
Survey Process: Represent population in a frame
- To draw the sample, we need a representation of the population in a physical format that enables us to identify and select units
- Frame: list of units from which we select the sample
- Examples
- List: such as telephone directory
- Map: geographic representation of locations of interest
- More details on frames to come
Survey Process: Sample Design and Selection
- Determine a sample size
- Precision of estimates
- Costs / budget
- Choose a sample design
- Distribute the sample to obtain precise estimates of characteristics of interest
- Practical factors often impact design
- Select the sample
Survey Process: Collect and prepare data
- Collect data (interview, observe, self-administer)
- Edit and code data
- Correct errors if possible
- Translate responses into numeric codes for analysis
Survey Process: Data Analysis
- Exploratory analysis
- Check for missing values, outliers, potential errors
- Examine relationships between survey responses and auxiliary information from external sources
- Estimation
- Compute a “survey weight” that projects the sample onto the larger population
- Estimation methods are tied to the survey design
- Variance estimation
- Quantify the uncertainty in the estimator
- Standard error, confidence interval, coefficient of variation
Survey Process and Stat 421 emphasis
- Step 1: Study Planning & Survey Design
- Define objectives, target population & parameters of interest
- Choose sampling design
- Choose data collection method
- Step 2: Preparation
- Create sampling frame
- Select sample
- Develop questions or measurements
- Pre-test & revise questionnaire/form
- Step 3: Collect and Prepare data
- Collect data
- Code data
- Edit data file
- Step 4: Data Analysis
- Calculate estimates of parameters
- Make inference about the population
Stat 421: Sample Design and Estimation
- Each sample design has its own estimators of population parameters (e.g., estimators of the population mean)
- Estimation and variance estimation depend on the properties of the sample design.
- Estimation in surveys involves computing a survey weight. This weight projects the sample onto the larger population.
- Objectives and survey designs are integrally related.
- Many different survey sample designs.
Probability Sampling Designs
- Basic Selection Methods
- Simple random sampling (Ch 2.3): Randomly select unit from list using equal probability selection method (e.g., draw chips from a bowl)
- Systematic sampling (Ch 2.7, 5.5): Sort units in frame, random start, take every k-th unit
- Sampling with probability proportional to a size or importance measure (Ch 6.2.3): Uses extra information on units, Larger or more important units have a higher chance of being included in sample
- Stratified sampling (Ch 3): Divide population into groups (strata), Select independent sample from each stratum
- Cluster sampling (Ch 5 & 6): Population units aggregated into larger units called clusters. The cluster is sampled in the selection process
Choosing the probability sampling design
- Statistical factors
- Consider survey objectives
- Most precise estimate
- Likelihood of generating a representative sample
- EX: a stratified design uses strata to legitimately exclude some samples that are unlikely to be representative
- Practical factors
- A list (or sampling frame) may not exist for elements of the population
- EX: A cluster design is needed when we want to sample elementary school children, but we have a list of schools - not a list of children.
Survey Estimation
- Inference in surveys should be compatible with the survey design
- Weights
- Estimators
- Variance estimators (confidence intervals)
Broad Syllabus
- Basic concepts in probability sampling and the survey process (2 weeks)
- Simple random sampling (2 weeks)
- Using extra information in sample selection (4 weeks)
- Systematic sampling
- Probability proportional to size
- Stratified sampling
- Using extra information in estimation (in context of simple random sampling) (2 weeks)
- Ratio estimation
- Regression estimation
- More complex estimation problems (2 weeks)
- Domain estimation
- Nonresponse
- Cluster sampling (2 weeks)
- Single stage
- Two-stage cluster sampling
Part 2: Foundations of Survey Sampling
Survey Design
- Survey design involves selecting methods to address all phases of the survey process
- Objectives
- Sample Design
- Data Collection
- Analysis approach
- Weights
- Estimation
- Variance estimation
Population and Sample

Definition
- Target population
- The entire set of units for which the survey data are to be used to make inferences.
- Thus, the target population defines those units for which the findings of the survey are meant to generalize.
- Survey Population
- The population from which the sample can be taken.
- Sampling frame
- A realized list of survey population
- Observational Units (elements)
- An object on which a measurement is taken; the members of the population
Finite Population
- The target population contains a FINITE NUMBER of units
- \(N\) = Total number of elements in the population
- Differs from notions of a population in other statistics courses
- Infinite population defined by all possible realizations from a distribution, such as a normal distribution
- For analysis, we act as if the population is infinite
- In Stat 421, we only consider finite population
- Population is a finite collection of \(N\) units.
Example
- Suppose that we are interested in the readership of the Des Moines register among Iowa adults
- We decide to estimate the percent of adults (ages 18 or older) residing in Iowa who read the Des Moines register during the week of Jan 8th-13th 2020
- Target population: All adults ages 18 or older residing in Iowa during the week of Jan 13th-18th of 2020.
- Element: Adult (individual 18 or older)
- Population size: N = 3.16 million (Census Bureau 2019 estimate)
Target population: Complexities
- Target populations are often difficult to define
- Example: Political poll for an election – What population should we target?
- Registered voters?
- Voters in the last election?
- Those “likely to vote” in the next election?
Element/Observation Unit: Complexities
Sampling Frame
- Telephone survey: sampling frame may be a list of telephone numbers
- Face-to-face interview survey: sampling frame may be a list of addresses
- Agricultural survey: sampling frame may be a map of areas containing farms
Sampling Frame: Complexities
Constructing a sampling frame that accurately reflects the target population can be a challenge.
- Units in the population may be excluded from the frame (This is called the undercoverage problem)
- Units in the frame may not be in the target population
If “frame”\(\neq\) “target population”, it is called coverage error.
- Example: What is the average payroll among Iowa businesses with more than 5 employees in 2020?
- Frame = list of businesses with more than 5 employees from 2019 tax records
- New businesses in 2020: In the population but not the frame
- Businesses that closed in 2020: In the frame but not the population
Sampling Frame Types: List and Area Frames
- List frames
- Examples: telephone numbers, addresses
- Strength: may contain good auxiliary information about the population
- Weakness: may exclude members of the population
- Area frames: geographic representation
- Examples: Map, area divided into parcels or tracts
- Strength: may completely cover the population
- Weakness: may have little auxiliary information; may contain ineligible units
List and Area Frame Examples
- National Crime and Victimization Survey
- What percent of US households were victimized by crime in 2019?
- Frame: list of households from US Census information and building permits
- Census Bureau area frame
- Divides US area into tracts, block groups, and blocks
- Blocks are clusters of households
- Block groups are clusters of blocks
- Tracts are clusters of block groups (and blocks)
Census Tract and Block Groups

Sample
- Sample: A subset of the survey population
- Sampled population: Collection of all possible observation units that might have been chosen in a sample
- Ideally, the sampled population is equal to the target population
- Why might the sampled population differ from the target population?
Population
