The Current Population Survey (CPS) is a monthly survey of households conducted by the Bureau of Census for the Bureau of Labor Statistics. It provides a comprehensive body of data on the labor force, employment, unemployment, persons not in the labor force, hours of work, earnings, and other demographic and labor force characteristics.
Explore the official Bureau of Labor Statistics (BLS) CPS website and
the IPUMS CPS website.
What is the population of interest for the Current Population
Survey? Be precise – are there any age criteria, occupation criteria, et
cetra?
The civilian noninstitutional population age 16 and older in the 50 states and the District of Columbia.
Active duty members of the Armed Forces are excluded.
Also excluded are people residing in any type of institution such as correctional institutions or long-term care facilities.
What is the sample used to estimate to estimate the population parameters? Be precise - how many households and/or people are included, are there anything interesting/unique about the survey methodology that you found?
The U.S. Census Bureau conducts the Current Population Survey (CPS), which involves conducting a sample survey of about 60,000 eligible households.
All of the counties and independent cities in the country first are grouped into approximately 2,000 geographic areas (sampling units). The Census Bureau then designs and selects a sample of about 800 of these geographic areas to represent each state and the District of Columbia.
The CPS is a probability of sample and cluster sampling uses to
create a areas.
Do you think CPS is a representative sample of the US entire population?
Yes, it is considered representative of the US labor market.
The sample size is also set by particular parameters that ensure a trustworthy source for assessing the unemployment rate at the national and state level.
CPS data can be accessed from a few places. One easy way is to create an account at IPUMS CPS to download the data. Go to IPUMS CPS and create an extract containing all the states, and variables INCWAGE (Wage and salary income) and LABFORCE (Labor force status)
# NOTE: To load data, you must download both the extract's data and the DDI
# and also set the working directory to the folder with these files (or change the path below).
if (!require("ipumsr")) stop("Reading IPUMS data into R requires the ipumsr package. It can be installed using the following command: install.packages('ipumsr')")
## Loading required package: ipumsr
# install.packages('ipumsr')
library("ipumsr")
remove(list = ls())
ddi <- read_ipums_ddi("cps_00043.xml")
data <- read_ipums_micro(ddi)
## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
library("ggplot2")
ggplot(data = data,
mapping = aes(x = LABFORCE,y = INCWAGE))+geom_point()
## Warning: Removed 2602318 rows containing missing values or values outside the scale
## range (`geom_point()`).
# 99999999 = N.I.U. (Not in Universe)
# 99999998 = Missing (1962-1966 only)
df <- data # duplicate data
df <- df[c(1,3,12:16)]
df$inc <- df$INCWAGE
library("psych")
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(df$inc)
## vars n mean sd median trimmed mad min max range skew
## X1 1 474234 20945796 40657777 33000 20945796 48925.8 0 1e+08 1e+08 1.43
## kurtosis se
## X1 0.05 59040.13
df$inc [ df$inc == 99999999] <- NA
df$inc [ df$inc == 99999998] <- NA
describe(df$inc)
## vars n mean sd median trimmed mad min max range skew
## X1 1 375035 35487.1 66514.04 15000 35487.1 22239 0 2099999 2099999 7.67
## kurtosis se
## X1 103.9 108.61
ipums_val_labels(df$LABFORCE)
## # A tibble: 3 × 2
## val lbl
## <int> <chr>
## 1 0 NIU
## 2 1 No, not in the labor force
## 3 2 Yes, in the labor force
ggplot(data = df,
mapping = aes(x = LABFORCE,y = inc)
) + geom_point() + scale_x_discrete(labels=c("0" = "Not in Universe", "1" = "No, not in the Labor Force", "2" = "Yes, In the Labor Force"))
## Warning: Removed 2701517 rows containing missing values or values outside the scale
## range (`geom_point()`).
ggplot(data = df,
mapping = aes(x = LABFORCE,y = inc)
) + geom_point() + scale_x_discrete(breaks=c("0","1","2"),
labels=c("NIU", "NILF", "ILF"))
## Warning: Removed 2701517 rows containing missing values or values outside the scale
## range (`geom_point()`).
df$labforce <- as.character(df$LABFORCE)
df$labforce[df$labforce=="0"] <- "NIU"
df$labforce[df$labforce=="1"] <- "NILF"
df$labforce[df$labforce=="2"] <- "ILF"
ggplot(data = df,
mapping = aes(x = labforce,y = inc))+geom_point()+labs(title="Income Wage by Labor Force Status \n Current Population Survey",
x ="Labor Force Status", y = "Personal Income")
## Warning: Removed 2701517 rows containing missing values or values outside the scale
## range (`geom_point()`).
Average wages are higher for employed group compared to unemployed group.
Average wages for people not in the labor force is the lowest.