Probability sampling is a sampling technique in which every
member of the population has a known and non-zero chance of being
selected. The key feature of probability sampling is that it allows for
the calculation of the sampling error and, therefore, the statistical
generalization of results to the population.
Non-probability
sampling is a sampling technique where the samples are gathered in a
process that does not give all the individuals in the population equal
chances of being selected. The likelihood of each member being selected
is unknown.
Probability Sampling:
Simple Random
Sampling: Each member of the population has an equal chance of being
selected, akin to a lottery draw.
Stratified Sampling: The
population is divided into subgroups, and members are randomly selected
from each subgroup, ensuring representation across key variables.
Non-Probability Sampling:
Convenience Sampling:
Participants are selected based on ease of access, such as using
volunteers or readily available individuals.
Purposive Sampling:
Individuals are chosen based on specific characteristics or qualities,
targeting a particular subset of the population.
The Current Population Survey aims to represent the entire civilian noninstitutionalized population of the United States. It provides the labor force statistics, including the unemployment rate. The CPS includes individuals who are 16 years of age and older.
The Current Population Survey (CPS) uses a sample of about 60,000 eligible households each month to estimate population parameters such as the unemployment rate and the employment-to-population ratio. The survey results are weighted to represent the entire population. Also, the CPS uses a multistage, stratified sampling approach. The United States is divided into strata, and then primary sampling units.
Even though the survey is mandatory, there can be issues with nonresponse, which can potentially bias results if the nonrespondents differ significantly from respondents. Though the samplese are weighted, it can’t be perfectly precise. All in all, it can be representative.
library(ipumsr)
## Warning: package 'ipumsr' was built under R version 4.3.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
ddi <- read_ipums_ddi("cps_00002.xml")
data <- read_ipums_micro(ddi)
## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
data2 <- data %>% filter(INCWAGE != 99999999)
library(ggplot2)
ggplot(data = data2,
mapping = aes(x = LABFORCE,
y = INCWAGE )) +
geom_point() +
labs(x = "Labor Force", y = "Income")
Conclusion: people under the age of 15 are unemployed and they
belong to the group that Labor Force = 0. It’s correct because these
people are unable to work. As a result, they have lower income. Labor
Force = 1 means they are not in the labor force. I think it’s true
because they can get lower income than employed people but higher than
children. Labor Force = 2 means they are employed people and they should
get the highest income.