The population of interest in the Current Population Survey (CPS) is the entire U.S. civilian noninstitutional population. This includes individuals who are 16 years and older. There are no specific age or occupation criteria mentioned as the survey aims to capture a comprehensive picture of the labor force and other demographic characteristics.
The CPS uses a sample of approximately 60,000 eligible households. These households are scientifically selected to reflect the entire U.S. civilian noninstitutional population. The sample is designed to be representative of the population, meaning that the characteristics of the selected households are intended to closely mirror those of the entire population. The sample is used to estimate population parameters and produce headline statistics such as the unemployment rate and employment-to-population ratio. One interesting aspect of the CPS methodology is its rotating panel design. The CPS combines data from the current month’s sample with data from the previous three months’ samples to create a “panel” of households. This panel design allows for longitudinal analysis, tracking individuals and households over time, and provides valuable information on labor market dynamics.
Based on its methodology and design, the CPS aims to be a representative sample of the entire U.S. population. The sample is carefully selected to reflect the population’s characteristics, and various statistical techniques are employed to ensure the reliability and accuracy of the estimates. However, it’s important to note that no survey can perfectly capture the entire population, and there may be some limitations or sampling biases inherent in any survey design. Nonetheless, the CPS is widely recognized as a reliable and authoritative source of labor force and demographic data in the United States.
library(ipumsr)
ddi <- read_ipums_ddi("cps_00001.xml")
cps_data <- read_ipums_micro(ddi)
## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
data <- cps_data
summary(cps_data)
## YEAR SERIAL MONTH HWTFINL
## Min. :2020 Min. : 1 Min. :3 Min. : 216.5
## 1st Qu.:2020 1st Qu.:20382 1st Qu.:3 1st Qu.: 1493.1
## Median :2020 Median :40984 Median :3 Median : 3279.8
## Mean :2020 Mean :41883 Mean :3 Mean : 2962.4
## 3rd Qu.:2021 3rd Qu.:61803 3rd Qu.:3 3rd Qu.: 4110.5
## Max. :2021 Max. :91500 Max. :3 Max. :14981.6
## NA's :157959
## CPSID ASECFLAG REGION STATEFIP
## Min. :0.000e+00 Min. :1.000 Min. :0 Min. : 1.00
## 1st Qu.:2.019e+13 1st Qu.:1.000 1st Qu.:0 1st Qu.: 9.00
## Median :2.020e+13 Median :1.000 Median :0 Median :19.00
## Mean :1.616e+13 Mean :1.405 Mean :0 Mean :19.48
## 3rd Qu.:2.020e+13 3rd Qu.:2.000 3rd Qu.:0 3rd Qu.:26.00
## Max. :2.021e+13 Max. :2.000 Max. :0 Max. :94.00
## NA's :107334 NA's :107334
## COUNTY METRO INDIVIDCC FAMINC
## Min. : 0 Min. :0 Min. :1.000 Min. :65
## 1st Qu.:25055 1st Qu.:0 1st Qu.:3.000 1st Qu.:65
## Median :50000 Median :0 Median :5.000 Median :65
## Mean :50137 Mean :0 Mean :4.597 Mean :65
## 3rd Qu.:75480 3rd Qu.:0 3rd Qu.:7.000 3rd Qu.:65
## Max. :99990 Max. :0 Max. :8.000 Max. :65
## NA's :107334 NA's :107334 NA's :107334
## PERNUM WTFINL CPSIDV CPSIDP
## Min. : 1.0 Min. :1.041e+09 Min. :1.010e+10 Min. :0.000e+00
## 1st Qu.: 3.0 1st Qu.:1.263e+09 1st Qu.:3.044e+10 1st Qu.:2.019e+03
## Median :22.0 Median :2.013e+09 Median :6.034e+10 Median :2.020e+03
## Mean :14.7 Mean :1.912e+09 Mean :1.521e+14 Mean :1.225e+11
## 3rd Qu.:23.0 3rd Qu.:2.363e+09 3rd Qu.:3.023e+14 3rd Qu.:2.404e+11
## Max. :24.0 Max. :3.485e+09 Max. :9.063e+14 Max. :2.471e+12
##
## AGE SEX RACE MARST POPSTAT
## Min. : 0.000 Min. :0 Min. : 0 Min. :0.0 Min. :0.000
## 1st Qu.: 1.000 1st Qu.:0 1st Qu.: 43 1st Qu.:0.0 1st Qu.:0.000
## Median : 2.000 Median :0 Median :254 Median :3.0 Median :3.000
## Mean : 3.629 Mean :0 Mean :277 Mean :3.6 Mean :3.615
## 3rd Qu.: 3.000 3rd Qu.:0 3rd Qu.:481 3rd Qu.:6.0 3rd Qu.:6.000
## Max. :12.000 Max. :0 Max. :711 Max. :9.0 Max. :9.000
##
## NCHILD NCHLT5 EMPSTAT LABFORCE
## Min. :0.0000000 Min. :0.000 Min. :0 Min. :0.00
## 1st Qu.:0.0000000 1st Qu.:1.000 1st Qu.:0 1st Qu.:1.00
## Median :0.0000000 Median :1.000 Median :0 Median :1.00
## Mean :0.0007991 Mean :1.696 Mean :0 Mean :1.56
## 3rd Qu.:0.0000000 3rd Qu.:2.000 3rd Qu.:0 3rd Qu.:2.00
## Max. :1.0000000 Max. :9.000 Max. :1 Max. :9.00
## NA's :107334 NA's :107334
## IND CLASSWKR UHRSWORKT WHYUNEMP
## Min. : 0 Min. : 0.00 Min. : 1.000 Min. :0.00000
## 1st Qu.:2542 1st Qu.:20.00 1st Qu.: 1.000 1st Qu.:0.00000
## Median :4843 Median :40.00 Median : 2.000 Median :0.00000
## Mean :4953 Mean :45.02 Mean : 2.738 Mean :0.09761
## 3rd Qu.:7394 3rd Qu.:70.00 3rd Qu.: 3.000 3rd Qu.:0.00000
## Max. :9999 Max. :90.00 Max. :12.000 Max. :6.00000
## NA's :107334 NA's :107334
## WNLOOK WKSTAT EDUC COVIDUNAW
## Min. : 0.0 Min. :11.00 Min. : 1.00 Min. :11.00
## 1st Qu.:118.0 1st Qu.:11.00 1st Qu.: 1.00 1st Qu.:11.00
## Median :148.0 Median :21.00 Median : 5.00 Median :13.00
## Mean :176.8 Mean :16.78 Mean : 37.85 Mean :15.42
## 3rd Qu.:185.0 3rd Qu.:21.00 3rd Qu.: 6.00 3rd Qu.:13.00
## Max. :777.0 Max. :28.00 Max. :529.00 Max. :30.00
##
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
data1 <- data %>% filter(LABFORCE != 99999999)
ggplot(data = data1,
mapping = aes(x = LABFORCE ,
y = LABFORCE )) + geom_point()
The income distribution within the unemployed cohort consistently exhibits lower average levels. This aligns with the expectation that individuals who are unemployed typically earn less due to factors such as reliance on unemployment benefits or sporadic part-time work, as opposed to receiving regular full-time wages. The group of individuals who are not actively participating in the labor force demonstrates considerable variations in income. This diverse category encompasses retirees, students, and potentially individuals relying on investment income, which may account for the observed disparities. The higher income levels observed among employed individuals, in comparison to their unemployed counterparts, are consistent with the fact that the former actively generate wages through gainful employment. The reported income figures within this group reflect typical wage levels, indicating a stable source of employment-derived earnings.
Yes, wage varies by region such as state.