The population of interest is civilian noninstitutionalized population who are 16 and older. CPS excludes the military and people who are in prison/confined to a different institution. It is split between working and non working. As an example of who are considered not working, students and housewives are considered not actively working for jobs. Those considered working are people with full time or part time jobs.
The sample is 62,000 households.
Yes, I do think CPS is a reprenstative sample of the entire US population. They take into account many different aspects. One example is the different categories they have for the “non working” population. They have absent from job, on layoff awaiting recall, actively looking and non of the above. They are able to split the data up into many different categories.
library(ipumsr)
You can also embed plots, for example:
ddi <- read_ipums_ddi("cps_00001.xml")
data <- read_ipums_micro(ddi)
## Use of data from IPUMS CPS is subject to conditions including that users should
## cite the data appropriately. Use command `ipums_conditions()` for more details.
summary(data)
## YEAR SERIAL MONTH HWTFINL
## Min. :2019 Min. : 1 Min. : 1.000 Min. : 0
## 1st Qu.:2021 1st Qu.:17959 1st Qu.: 3.000 1st Qu.: 1551
## Median :2021 Median :35793 Median : 5.000 Median : 3368
## Mean :2021 Mean :36587 Mean : 5.582 Mean : 3071
## 3rd Qu.:2022 3rd Qu.:54426 3rd Qu.: 9.000 3rd Qu.: 4269
## Max. :2023 Max. :94633 Max. :12.000 Max. :17273
## NA's :654335
## CPSID ASECFLAG ASECWTH PERNUM
## Min. :0.000e+00 Min. :1.0 Min. : 109.7 Min. : 1.000
## 1st Qu.:2.020e+13 1st Qu.:1.0 1st Qu.: 970.7 1st Qu.: 1.000
## Median :2.021e+13 Median :1.0 Median :1888.0 Median : 2.000
## Mean :1.882e+13 Mean :1.2 Mean :1937.2 Mean : 2.128
## 3rd Qu.:2.021e+13 3rd Qu.:1.0 3rd Qu.:2563.5 3rd Qu.: 3.000
## Max. :2.023e+13 Max. :2.0 Max. :9975.4 Max. :16.000
## NA's :2394449 NA's :2602318
## WTFINL CPSIDP ASECWT LABFORCE
## Min. : 0 Min. :0.000e+00 Min. : 86.5 Min. :0.000
## 1st Qu.: 1554 1st Qu.:2.020e+13 1st Qu.: 990.2 1st Qu.:1.000
## Median : 3357 Median :2.021e+13 Median : 1911.4 Median :1.000
## Mean : 3140 Mean :1.882e+13 Mean : 2000.1 Mean :1.303
## 3rd Qu.: 4349 3rd Qu.:2.021e+13 3rd Qu.: 2681.4 3rd Qu.:2.000
## Max. :31157 Max. :2.023e+13 Max. :17421.8 Max. :2.000
## NA's :654335 NA's :2602318
## INCWAGE
## Min. : 0
## 1st Qu.: 0
## Median : 32000
## Mean :21125081
## 3rd Qu.: 124000
## Max. :99999999
## NA's :2602318
library("ggplot2")
ggplot(data=data,
mapping = aes(x=LABFORCE, y= INCWAGE)) + geom_point()
## Don't know how to automatically pick scale for object of type
## <haven_labelled/vctrs_vctr/integer>. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type
## <haven_labelled/vctrs_vctr/double>. Defaulting to continuous.
## Warning: Removed 2602318 rows containing missing values (`geom_point()`).
data <- data[c(1,3,12,13)]
library("psych")
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(data$INCWAGE)
## vars n mean sd median trimmed mad min max range skew
## X1 1 654335 21125081 40786003 32000 21125081 47443.2 0 1e+08 1e+08 1.42
## kurtosis se
## X1 0.01 50420.96
data$INCWAGE[ data$INCWAGE == 99999999] <- NA
data$INCWAGE[ data$INCWAGE == 99999998] <- NA
describe(data$INCWAGE)
## vars n mean sd median trimmed mad min max range skew
## X1 1 516286 34825.21 65186.22 15000 34825.21 22239 0 2099999 2099999 7.8
## kurtosis se
## X1 107.24 90.72
ggplot(data=data,
mapping = aes(x=LABFORCE, y= INCWAGE)) + geom_point() + labs(x = "Labor Force Status", y= "Wage and Salary Annual Income") + scale_x_discrete()
## Don't know how to automatically pick scale for object of type
## <haven_labelled/vctrs_vctr/double>. Defaulting to continuous.
## Warning: Removed 2740367 rows containing missing values (`geom_point()`).
This graph of wage and salary income versus labor force status does make sense. For the labor force status, 0 corresponds to people who are not in the universe (NIU). This includes people who are in the military, those who are imprisoned and children until 14. For that reason, it does make sense that their wage and salary annual income is much less than those who are not in the labor force and are in the labor force. However, there are a few outliers in the 0 category. I wonder if CPS sometimes misclassifies people who were previously in the military and now they are not. 2 corresponds to people who work full time/part time. It makes sense that they would have the greatest wage and salary annual income.