Pedro Fernandes · Read aloud · ~3–4 minutes
Ricardo speaks slides 1. Pedro speaks slides 2–7.
Hello. I am Ricardo, and together with Pedro, we will present our Sampling Methods project. pauseThis project was a practical exercise — applying sampling theory directly to data. We designed, applied, and compared three sampling strategies. The entire project was implemented in R.
· Look at the audience — not the screen
· Speak slowly — breath here
The dataset was provided by Professor Pedro Simões Coelho. It contains fifty thousand individuals described by thirteen variables, and it is treated as the complete target population — meaning we have access to every observation. pause The project has following objectives: 1.compare three sampling designs, 2.identify potential issues in each one, 3.and explain how we addressed them.
· Full population — 50,000 individuals — given by
the professor
· True values are known — this is what makes the
comparison objective
· Three objectives — compare · issues · solutions
The comparison uses a fixed precision framework. All three designs target the same precision requirements at 95% confidence. pause Mean income within 830 euros. Poverty rate within 3%. Total expenditure within 3% of the population total. pause Because the targets are the same for all designs, efficiency is measured by sample size. Which means, The design that meets all targets, with the smallest sample, wins. A fixed random seed was used in all designs to ensure the results are fully reproducible.
· Same targets for all — smallest n wins
· Income ±€830 · Poverty ±3 pp · Expenditure ±3%
· Z = 1.96 · 95% confidence · fixed seed → reproducible
If asked why absolute precision for poverty “For a proportion, absolute precision is more meaningful than relative. Three percentage points is a standard target in poverty estimation.”
If asked about using the full population “Having the full population lets us verify the estimates against the true values — which makes the comparison between designs objective.”
Three designs were implemented. pause Design one — Simple Random Sampling — is the baseline. Design two — Stratified Sampling — divides the population into four groups by education level, with two allocation methods: proportional and optimum. Design three — Cluster PPS — selects entire neighbourhoods with Probability Proportional to their Size (PPS).
· SRS — baseline · equal probability · no
structure
· Stratified — education · proportional + Neyman
allocation
· Cluster PPS — select neighbourhoods ·
size-proportional
In SRS, every individual has the same inclusion probability. Sampling is done without replacement. pause The sample size formula was applied separately to each parameter. Mean income required 1026. Poverty rate required 829. Total expenditure required 1.129 — the biggest, the one with a circle on the slide. pause Since a single sample serves all three parameters, the binding sample size is the maximum: n equals one thousand one hundred and twenty-eight. Each individual had an inclusion probability of 2.26%.
· Income 1,026 · Poverty 821 · Expenditure 1,128 ←
binding
· n = 1,128 · π = 2.256%
If asked why expenditure is binding “Expenditure has the highest variance. Higher variance requires a larger sample to achieve the same precision.”
This table summarises the estimators and variance estimators used under SRS. pause The variance formulas all include the finite population correction, one minus f, where f is the sampling fraction. With a sampling fraction of 2.26%,the correction is small but, we decided to include it for accurancy. pause For the poverty rate, the variance uses p-hat times one minus p-hat, divided by n minus one — the standard formula for a binary variable.
· FPC = (1 − f) — f = 2.3% — small but required
· Poverty variance: p̂(1 − p̂) / (n − 1)
· All CIs at Z = 1.96
If asked about the FPC “The correction reduces the variance when sampling without replacement from a finite population. With a small sampling fraction like ours, the effect is minor — but theoretically required.”
All three precision targets were met. pause Mean income was estimated at twenty-seven thousand nine hundred euros, with a relative precision of two point eight seven percent. The poverty rate at twenty-five point seven percent, with an absolute precision of two point five two percentage points. Total expenditure at approximately one billion euros, with a relative precision of three point zero one percent — just at the target. pause The true population values are inside all confidence intervals. Income and poverty were more precise than required, because the sample was sized for the hardest criterion — total expenditure. pause This is Design one. It is the baseline. The other two designs will be compared against n equals one thousand one hundred and twenty-eight.
· All three targets met ✓
· True values inside all 95% CIs ✓
· Income and poverty: better than required — driven by
expenditure
· Baseline: n = 1,128
If asked about poverty relative precision showing 9.81% “The relative precision looks large because the estimate is a small proportion. The relevant measure is absolute — two point five two percentage points — which is well within the three point target.”
Sampling Methods · NOVA IMS · 2025–2026 · Professor Pedro Simões Coelho