I.

  1. Probability sampling is a sampling technique in which every member of the population has a known and non-zero chance of being selected. The key feature of probability sampling is that it allows for the calculation of the sampling error and, therefore, the statistical generalization of results to the population.
    Non-probability sampling is a sampling technique where the samples are gathered in a process that does not give all the individuals in the population equal chances of being selected. The likelihood of each member being selected is unknown.

  2. Probability Sampling:
    Simple Random Sampling: Each member of the population has an equal chance of being selected, akin to a lottery draw.
    Stratified Sampling: The population is divided into subgroups, and members are randomly selected from each subgroup, ensuring representation across key variables.
    Non-Probability Sampling:
    Convenience Sampling: Participants are selected based on ease of access, such as using volunteers or readily available individuals.
    Purposive Sampling: Individuals are chosen based on specific characteristics or qualities, targeting a particular subset of the population.

II.

  1. The Current Population Survey aims to represent the entire civilian noninstitutionalized population of the United States. It provides the labor force statistics, including the unemployment rate. The CPS includes individuals who are 16 years of age and older.

  2. The Current Population Survey (CPS) uses a sample of about 60,000 eligible households each month to estimate population parameters such as the unemployment rate and the employment-to-population ratio. The survey results are weighted to represent the entire population. Also, the CPS uses a multistage, stratified sampling approach. The United States is divided into strata, and then primary sampling units.

  3. Even though the survey is mandatory, there can be issues with nonresponse, which can potentially bias results if the nonrespondents differ significantly from respondents. Though the samplese are weighted, it can’t be perfectly precise. All in all, it can be representative.

library(ipumsr)
## Warning: package 'ipumsr' was built under R version 4.3.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
ddi <- read_ipums_ddi("cps_00002.xml")
data <- read_ipums_micro(ddi)
## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
data2 <- data  %>%   filter(INCWAGE != 99999999) 
library(ggplot2)
ggplot(data = data2, 
       mapping = aes(x = LABFORCE, 
                     y = INCWAGE  )) + 
      geom_point() + 
      labs(x = "Labor Force", y = "Income")


Conclusion: people under the age of 15 are unemployed and they belong to the group that Labor Force = 0. It’s correct because these people are unable to work. As a result, they have lower income. Labor Force = 1 means they are not in the labor force. I think it’s true because they can get lower income than employed people but higher than children. Labor Force = 2 means they are employed people and they should get the highest income.