setwd("/Users/diegodearmas/Downloads")
install.packages("ipumsr", repos = "https://cloud.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/5_/389qrkvs1sd7nkp792bslx5r0000gn/T//Rtmpw3LESC/downloaded_packages
library(ipumsr)
install.packages("dplyr", repos = "https://cloud.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/5_/389qrkvs1sd7nkp792bslx5r0000gn/T//Rtmpw3LESC/downloaded_packages
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
install.packages("psych", repos = "https://cloud.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/5_/389qrkvs1sd7nkp792bslx5r0000gn/T//Rtmpw3LESC/downloaded_packages
library(psych)
install.packages("ggplot2", repos = "https://cloud.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/5_/389qrkvs1sd7nkp792bslx5r0000gn/T//Rtmpw3LESC/downloaded_packages
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

1) What is the population of interest in the Current Population Survey (CPS)? Be precise – are there any age criteria, occupation criteria, and/or geographic criteria ?

The population of interest is any U.S. civilian, noninstitutionalized population ages 16 and older. The sample excludes people living in institutions, and those active in the armed forces. The survey is also for designed for age 16 and over (no upper limit).

The Current Population Survey (CPS) is a sample survey of about 60,000 eligible households conducted by the U.S. Census Bureau. A unique aspect of the methodology is how they implement a revolving panel desing.

3) Do you think CPS is representative sample of the US entire population after reading about its methodology or your online reserach?

The CPS is a representative sample because it selects a multistage probability-based sample of households in the United States. The sample size is also determined by specific criteria that ensure a reliable source for measuring the unemployment rate at the national and state level.

4)

cps_ddi_file <- "cps_00001.xml"
cps_data_file <- "cps_00001.dat"

cps_ddi <- read_ipums_ddi(cps_ddi_file) 
cps_data <- read_ipums_micro(cps_ddi_file, data_file = cps_data_file)
## Use of data from IPUMS CPS is subject to conditions including that users should
## cite the data appropriately. Use command `ipums_conditions()` for more details.

6) Plot / summarize income wage variable by labor force status. You can revise your data extract easily as you decide what variables to add/discard/keep. Do you find any patterns in labor force statistics that make sense, such as income varying by labor force status?

df <- cps_data %>% 
  filter(YEAR == 2022 & MONTH == 1) 

df <- cps_data %>% 
  filter(YEAR == 2022 & ASECFLAG == 1) 

df2 <- df  %>%   filter(INCWAGE != 99999999) 
describe(df2$INCWAGE)
##    vars      n    mean       sd median trimmed   mad min     max   range skew
## X1    1 121119 36799.6 68923.98  15000 36799.6 22239   0 1550000 1550000 7.27
##    kurtosis     se
## X1    89.55 198.05
describe(df2$LABFORCE)
##    vars      n mean  sd median trimmed mad min max range  skew kurtosis se
## X1    1 121119 1.61 0.5      2    1.61   0   0   2     2 -0.53    -1.45  0
ggplot(data = df2, 
       mapping = aes(x = LABFORCE ,
        y = INCWAGE  )) + geom_point()
## Don't know how to automatically pick scale for object of type
## <haven_labelled/vctrs_vctr/integer>. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type
## <haven_labelled/vctrs_vctr/double>. Defaulting to continuous.

df2 %>% 
  group_by(LABFORCE) %>%
  summarize(mincome = mean(INCWAGE),
            medincome = median(INCWAGE))
## # A tibble: 3 × 3
##   LABFORCE                       mincome medincome
##   <int+lbl>                        <dbl>     <dbl>
## 1 0 [NIU]                         63039.     55681
## 2 1 [No, not in the labor force]   2424.         0
## 3 2 [Yes, in the labor force]     58378.     42000

Yes the grpah makes sense because we can observate that households that reported employement have a higher income that household that not reported employment.