DS 8

Haiding Luo

2023 11 6

I. SAMPLING METHODOLOGIES

Probability sampling is a random sampling method in which each unit in the population has a known or calculable probability of being selected into the sample. Due to its randomness, probability sampling provides a higher chance of generating a well-representative sample that better reflects the characteristics of the population. Common methods of probability samBecause over the years, it has remained the benchmark for numerous researchers, and by analyzing the data, we continue to gain valuable insights.pling include simple random sampling, stratified sampling, systematic sampling, and multi-stage sampling.

Non-probability sampling is a non-random sampling method where sample selection typically relies on the researcher’s subjective judgment or specific criteria. Due to its non-random nature, non-probability sampling may result in samples that are less representative and may not accurately reflect the characteristics of the population. Common non-probability sampling methods include convenience sampling, judgmental sampling, quota sampling, and purposive sampling.

Simple random sampling is a probability sampling method in which a sample is chosen from a population in a random manner, ensuring that each unit in the population has an equal chance of being selected. This implies that the probability of each individual being selected is known, equal, and independent of the choices of other units.

Systematic sampling is a method of probability sampling where samples are selected from a population in a systematic manner. In systematic sampling, researchers randomly select a starting point from the population and then choose sample units at fixed intervals, until the desired sample size is reached.

Convenience sampling involves sampling individuals or groups that are easily accessible. One advantage of this method is its relative simplicity and speed, allowing for the quick collection of a large amount of data. However, it also makes it challenging to obtain a representative sample, which may result in inaccurate conclusions.

Snowball sampling is a snowballing sampling method that starts with a randomly selected initial sample and then relies on respondents to recommend the next respondents, gradually expanding the sample. This method is commonly used in opinion research to investigate public opinions within a specific population.

II.

1 The population of interest in the Current Population Survey (CPS) is the civilian noninstitutional population of the United States. This population includes individuals who are:

The population of interest in the Current Population Survey (CPS) is the civilian noninstitutional population of the United States. There are no specific age or occupation criteria for the entire CPS population. The CPS aims to cover a broad range of demographic and labor force characteristics, so it includes individuals of all ages and various occupations.

They used multistage sampling, The first stage of sampling involves selecting a set of geographic areas or primary sampling units that cover the entire United States. These PSUs are often counties or groups of counties, but they can also be metropolitan areas or clusters of smaller counties.

Yes I do, because over the years, it has remained the benchmark for numerous researchers and by analyzing the data we continue to gain valuable information.

if (!require("ipumsr")) stop("Reading IPUMS data into R requires the ipumsr package. It can be installed using the following command: install.packages('ipumsr')")

## 载入需要的程辑包：ipumsr

## Warning: 程辑包'ipumsr'是用R版本4.3.2 来建造的

 install.packages('ipumsr')

## Warning: 正在使用'ipumsr'这个程序包，因此不会被安装

 library(ipumsr)

 ddi <- read_ipums_ddi("C:/cps_00002.xml")
data <- read_ipums_micro(ddi)

## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

library(dplyr)

## 
## 载入程辑包：'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data1 <- data %>% filter(INCWAGE !=  99999999)

library(ggplot2)
ggplot(data = data1, 
       mapping = aes(x = LABFORCE ,
        y = INCWAGE  )) + geom_point()

library(psych)

## 
## 载入程辑包：'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

describe(data1$INCWAGE)

##    vars       n     mean       sd median  trimmed     mad min     max   range
## X1    1 2154587 30310.41 56835.37  12000 30310.41 17791.2   0 2099999 2099999
##    skew kurtosis    se
## X1 8.15   121.31 38.72

data1 %>%
  group_by(LABFORCE) %>%
  summarise(mean(INCWAGE),
            median(INCWAGE))

## # A tibble: 3 × 3
##   LABFORCE                       `mean(INCWAGE)` `median(INCWAGE)`
##   <int+lbl>                                <dbl>             <dbl>
## 1 0 [NIU]                                 53947.             48000
## 2 1 [No, not in the labor force]           1963.                 0
## 3 2 [Yes, in the labor force]             46763.             33500

Higher labor force leads to higher income.