I
The process of choosing a sample from a population so that each person has a known, non-zero chance of being included in the sample is known as probability sampling. By doing this, it is ensured that the sample is representative of the population and that reliable statistical conclusions can be drawn.
The probability that a specific member of the population will be included in the sample is either unknown or unequal in non-probability sampling techniques, which do not use random selection. When using probability sampling techniques is difficult or impracticable, these strategies are frequently used.
Simple random sampling is a viable option if we have an exhaustive list of the whole population and we want each person to have an equal probability of being included.
When the population is heterogeneous and we wish to guarantee representation from several subgroups, we should use stratified sampling. It offers accurate approximations for every subgroup.
The first survey is superior since it uses a cross-sectional design, which gathers information from a sample of people all at once. It is frequently used to comprehend the features of a population at a particular time in a variety of industries, including public opinion polling. In the second design, information is gathered from the same people at various intervals. While it’s helpful for examining shifts and patterns within a group, participants are chosen based on their judgment and level of experience, which makes it appropriate for qualitative research.
II
Every month, the U.S. Bureau of the Census conducts the Current Population Survey (CPS), which is completed by more than 65,000 households.
They neglected to include a number of variables that people would take into account before choosing to enter the labor market, such as the type of work, hours worked, location, and compensation, among others.
Among the major worker groups, the unemployment rates for adult men (3.7 percent), adult women (3.3 percent), teenagers (13.2 percent), Whites (3.5 percent), Blacks (5.8 percent), Asians (3.1 percent), and Hispanics (4.8 percent)
Employment rate to population 2023 is 60.2
The threshold for a statistically significant change in the household survey is about 600,000.
One thing kinda interested to me is that white and Asian has the lower umemployment rate than black and Hispanics
I think so, The Current Population Survey (CPS) is conducted monthly, and it uses a rotating panel survey design to estimate population parameters and headline statistics like the unemployment rate and employment-to-population ratio.
4.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ipumsr)
5.
cps_ddi <- read_ipums_ddi("/Users/timyang/Downloads/cps_00001.xml")
cps_data <- read_ipums_micro(cps_ddi, data_file = "/Users/timyang/Downloads/cps_00001.dat", verbose = FALSE)
cps_data
## # A tibble: 4,198,268 × 14
## YEAR SERIAL MONTH HWTFINL CPSID ASECFLAG ASECWTH PERNUM WTFINL CPSIDP
## <dbl> <dbl> <int+lbl> <dbl> <dbl> <int+lb> <dbl> <dbl> <dbl> <dbl>
## 1 2019 4 3 [March] NA 2.02e13 1 [ASEC] 2032. 1 NA 2.02e13
## 2 2019 6 3 [March] NA 2.02e13 1 [ASEC] 1232. 1 NA 2.02e13
## 3 2019 7 3 [March] NA 2.02e13 1 [ASEC] 1209. 1 NA 2.02e13
## 4 2019 8 3 [March] NA 2.02e13 1 [ASEC] 1146. 1 NA 2.02e13
## 5 2019 8 3 [March] NA 2.02e13 1 [ASEC] 1146. 2 NA 2.02e13
## 6 2019 13 3 [March] NA 2.02e13 1 [ASEC] 1588. 1 NA 2.02e13
## 7 2019 15 3 [March] NA 2.02e13 1 [ASEC] 1583. 1 NA 2.02e13
## 8 2019 18 3 [March] NA 2.02e13 1 [ASEC] 981. 1 NA 2.02e13
## 9 2019 18 3 [March] NA 2.02e13 1 [ASEC] 981. 2 NA 2.02e13
## 10 2019 20 3 [March] NA 2.02e13 1 [ASEC] 1539. 1 NA 2.02e13
## # ℹ 4,198,258 more rows
## # ℹ 4 more variables: CPSIDV <dbl>, ASECWT <dbl>, LABFORCE <int+lbl>,
## # INCWAGE <dbl+lbl>
cps_ddi
## An IPUMS DDI for IPUMS CPS with 14 variables
## Extract 'cps_00001.dat' created on 2023-11-07
## User notes:
plot <- table(c("INCWAGE", "LABFORCE"))
barplot(plot, main = "Categorical Variable Distribution")
summary(cps_data)
## YEAR SERIAL MONTH HWTFINL
## Min. :2019 Min. : 1 Min. : 1.000 Min. : 0
## 1st Qu.:2021 1st Qu.:18022 1st Qu.: 3.000 1st Qu.: 1567
## Median :2022 Median :35856 Median : 5.000 Median : 3392
## Mean :2022 Mean :36550 Mean : 5.481 Mean : 3107
## 3rd Qu.:2022 3rd Qu.:54342 3rd Qu.: 8.000 3rd Qu.: 4315
## Max. :2023 Max. :94633 Max. :12.000 Max. :18077
## NA's :800468
## CPSID ASECFLAG ASECWTH PERNUM
## Min. :0.000e+00 Min. :1.0 Min. : 110 Min. : 1.000
## 1st Qu.:2.020e+13 1st Qu.:1.0 1st Qu.: 998 1st Qu.: 1.000
## Median :2.021e+13 Median :1.0 Median : 1923 Median : 2.000
## Mean :1.890e+13 Mean :1.3 Mean : 1982 Mean : 2.125
## 3rd Qu.:2.022e+13 3rd Qu.:2.0 3rd Qu.: 2627 3rd Qu.: 3.000
## Max. :2.023e+13 Max. :2.0 Max. :10925 Max. :16.000
## NA's :3093585 NA's :3397800
## WTFINL CPSIDP CPSIDV ASECWT
## Min. : 0 Min. :0.000e+00 Min. :0.000e+00 Min. : 86
## 1st Qu.: 1572 1st Qu.:2.020e+13 1st Qu.:2.020e+14 1st Qu.: 1018
## Median : 3388 Median :2.021e+13 Median :2.021e+14 Median : 1951
## Mean : 3182 Mean :1.890e+13 Mean :1.890e+14 Mean : 2048
## 3rd Qu.: 4402 3rd Qu.:2.022e+13 3rd Qu.:2.022e+14 3rd Qu.: 2749
## Max. :44748 Max. :2.023e+13 Max. :2.023e+14 Max. :17422
## NA's :800468 NA's :3397800
## LABFORCE INCWAGE
## Min. :0.000 Min. : 0
## 1st Qu.:1.000 1st Qu.: 0
## Median :1.000 Median : 33280
## Mean :1.306 Mean :20957460
## 3rd Qu.:2.000 3rd Qu.: 125000
## Max. :2.000 Max. :99999999
## NA's :3397800
6.
# Assuming we have a data frame named "cps_data" with columns "LABFORCE" and "INCWAGE"
plot(cps_data$LABFORCE, cps_data$INCWAGE, type = "l", col = "blue",
xlab = "X-Axis Label", ylab = "Y-Axis Label", main = "Line Plot")
# Assuming we have a data frame named "cps_data" with columns "INCWAGE" and "LABFORCE"
plot(cps_data$INCWAGE, cps_data$LABFORCE, pch = 16, col = "blue",
xlab = "X-Axis Label", ylab = "Y-Axis Label", main = "Scatter Plot")
One trend that is evident is the rise in the labor force as a result of population growth. and the percentage of people in the labor force and the area’s income level are negatively correlated.