Effect of Immigration on Wages

This is a study of the effect of immigration on US wages. The study uses difference-in-differences to compare the wages in Miami, the treatment variable, and 4 control group cities before and after the Mariel Boatlift (mass immigration of Cubans to Miami in 1980). If immigration lowers wages, we should expect to see a decrease in wages in Miami relative to control group cities after the Mariel Boatlift.

library(readr)
library(ggplot2)
library(dplyr)
stat_assign1_data <- read.csv("/Users/jenniferlu/Desktop/R/data/stat_assign1_data.csv")

Dataset Details

The data comes from the Current Population Survey (CPS) conducted by the Bureau of Census. The dataset consists of 80,456 observations from 1976 to 1984. There are 5 variables in the dataset:

  • year: calendar year
  • metarea: CPS area code (https: //cps.ipums.org/cps-action/variables/METAREA#codes_section)
  • age
  • wage: hourly
  • edcode: education, coded as: 1 = high school dropout, 2 = high school graduate, 3 = some college, 4 = college graduate

The treatment group Miami before 1980 had an average wage of 10.50 per hour. The treatment group is going to be compared to four control cities: Minneapolis (10.58), Baltimore (10.17), Newark (10.61), and Rochester (10.50)

filtered_wages <- stat_assign1_data %>%
  filter(metarea %in% c(0720, 5120, 5605, 6840, 5000))

summary(filtered_wages)
##       year         metarea          age             wage            edcode     
##  Min.   :1976   Min.   : 720   Min.   :25.00   Min.   : 1.525   Min.   :1.000  
##  1st Qu.:1978   1st Qu.: 720   1st Qu.:31.00   1st Qu.: 6.642   1st Qu.:2.000  
##  Median :1980   Median :5120   Median :38.00   Median : 9.442   Median :2.000  
##  Mean   :1980   Mean   :4092   Mean   :39.68   Mean   :10.261   Mean   :2.614  
##  3rd Qu.:1982   3rd Qu.:5605   3rd Qu.:48.00   3rd Qu.:12.659   3rd Qu.:4.000  
##  Max.   :1984   Max.   :6840   Max.   :59.00   Max.   :39.774   Max.   :4.000

Setting up the Difference-In-Difference

create dummy variable post = 1 if year >= 1980, = 0 if year < 1980

filtered_wages$post = as.numeric(filtered_wages$year >= 1980)

create dummy variable treat = 1 if individual is from Miami, = 0 if not

filtered_wages$treat = as.numeric(filtered_wages$metarea == 5000)

post*treat variable in Difference in Difference

filtered_wages$DD = filtered_wages$treat * filtered_wages$post

regression: y_[it] = beta_0 + beta_1 (post_t*treat_i) + beta_2 post_t + beta_3 treat_i + epsilon_[it]

m1 <- lm(wage ~ DD + treat + factor(metarea) + factor(edcode) + factor(year), data = filtered_wages)

summary(m1)
## 
## Call:
## lm(formula = wage ~ DD + treat + factor(metarea) + factor(edcode) + 
##     factor(year), data = filtered_wages)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.979  -3.309  -0.631   2.470  28.859 
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          8.87147    0.22738  39.016  < 2e-16 ***
## DD                   0.04550    0.36818   0.124   0.9017    
## treat               -1.36337    0.28703  -4.750 2.07e-06 ***
## factor(metarea)5000       NA         NA      NA       NA    
## factor(metarea)5120 -0.11375    0.15738  -0.723   0.4699    
## factor(metarea)5605  0.15308    0.17017   0.900   0.3684    
## factor(metarea)6840  0.03875    0.21071   0.184   0.8541    
## factor(edcode)2      1.27736    0.17501   7.299 3.21e-13 ***
## factor(edcode)3      1.84197    0.20458   9.004  < 2e-16 ***
## factor(edcode)4      4.68472    0.18038  25.972  < 2e-16 ***
## factor(year)1977     0.13627    0.25067   0.544   0.5867    
## factor(year)1978    -0.25569    0.25025  -1.022   0.3069    
## factor(year)1979    -0.58864    0.25011  -2.354   0.0186 *  
## factor(year)1980    -0.35451    0.25238  -1.405   0.1602    
## factor(year)1981    -1.25498    0.25388  -4.943 7.85e-07 ***
## factor(year)1982    -1.35309    0.25501  -5.306 1.15e-07 ***
## factor(year)1983    -1.05906    0.25305  -4.185 2.88e-05 ***
## factor(year)1984    -1.21585    0.25129  -4.838 1.34e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.046 on 7362 degrees of freedom
## Multiple R-squared:  0.1188, Adjusted R-squared:  0.1169 
## F-statistic: 62.02 on 16 and 7362 DF,  p-value: < 2.2e-16

The value of DD was 0.04, which means that the Mariel Boatlift had a small positive effect on wages in Miami. The treat variable was -1.36, which means the wages of workers in Miami were 1.36 higher than workers in comparison cities.

Graphing the Difference-In-Difference

the mean of wages

data.means <- filtered_wages %>%
  group_by(year, treat) %>%
  summarize(avg_wage = mean(wage))
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
data.means
## # A tibble: 18 x 3
## # Groups:   year [9]
##     year treat avg_wage
##    <int> <dbl>    <dbl>
##  1  1976     0    10.7 
##  2  1976     1     9.82
##  3  1977     0    11.2 
##  4  1977     1     9.11
##  5  1978     0    10.7 
##  6  1978     1     9.26
##  7  1979     0    10.5 
##  8  1979     1     8.85
##  9  1980     0    10.7 
## 10  1980     1     9.27
## 11  1981     0     9.90
## 12  1981     1     8.13
## 13  1982     0     9.91
## 14  1982     1     7.38
## 15  1983     0    10.3 
## 16  1983     1     8.30
## 17  1984     0    10.1 
## 18  1984     1     9.90
ggplot(data.means, aes(x = year, y = avg_wage, linetype = factor(treat))) + geom_line() + geom_vline(xintercept = 1980) + scale_linetype_discrete(labels = c("Comparison Cities", "Miami"))

##Immigration on low skill-level workers

Test on low skill-level workers(edcode = 1) in Miami and comparison cities

filtered_wages$treat2 = ifelse(filtered_wages$edcode == 1, as.numeric(filtered_wages$metarea == 5000), NA)

post*treat variable in Difference in Difference

filtered_wages$DD_low = filtered_wages$treat2 * filtered_wages$post

regression: y_[it] = beta_0 + beta_1 (post_t*treat_i) + beta_2 post_t + beta_3 treat_i + epsilon_[it]

m1_low <- lm(wage ~ DD_low + treat2 + factor(metarea) + factor(year), data = filtered_wages)

summary(m1_low)
## 
## Call:
## lm(formula = wage ~ DD_low + treat2 + factor(metarea) + factor(year), 
##     data = filtered_wages)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4753 -2.8050 -0.4536  2.1566 29.5234 
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          8.64661    0.30826  28.050  < 2e-16 ***
## DD_low              -0.28018    0.61611  -0.455  0.64936    
## treat2              -1.21505    0.46296  -2.625  0.00878 ** 
## factor(metarea)5000       NA         NA      NA       NA    
## factor(metarea)5120  0.60693    0.33297   1.823  0.06858 .  
## factor(metarea)5605 -0.11504    0.29890  -0.385  0.70040    
## factor(metarea)6840  0.93791    0.41389   2.266  0.02362 *  
## factor(year)1977     0.69298    0.41444   1.672  0.09476 .  
## factor(year)1978     0.08936    0.43345   0.206  0.83670    
## factor(year)1979    -0.32804    0.42502  -0.772  0.44037    
## factor(year)1980    -0.61709    0.43889  -1.406  0.15996    
## factor(year)1981    -1.42206    0.44592  -3.189  0.00146 ** 
## factor(year)1982    -1.30302    0.45321  -2.875  0.00411 ** 
## factor(year)1983    -0.72055    0.45509  -1.583  0.11360    
## factor(year)1984    -1.57139    0.48041  -3.271  0.00110 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.871 on 1248 degrees of freedom
##   (6117 observations deleted due to missingness)
## Multiple R-squared:  0.0617, Adjusted R-squared:  0.05193 
## F-statistic: 6.313 on 13 and 1248 DF,  p-value: 1.186e-11

The DD_low variable value is -0.28. This means that low-skill workers in Miami after 1980 experienced a decrease of 0.28 in hourly wage.

data.means2 <- filtered_wages %>%
  group_by(year, treat2) %>%
  summarize(avg_wage_low = mean(wage))
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
data.means2 <- na.omit(data.means2)

data.means2
## # A tibble: 18 x 3
## # Groups:   year [9]
##     year treat2 avg_wage_low
##    <int>  <dbl>        <dbl>
##  1  1976      0         8.77
##  2  1976      1         7.74
##  3  1977      0         9.47
##  4  1977      1         8.27
##  5  1978      0         8.99
##  6  1978      1         7.16
##  7  1979      0         8.58
##  8  1979      1         6.92
##  9  1980      0         7.98
## 10  1980      1         7.95
## 11  1981      0         7.27
## 12  1981      1         6.43
## 13  1982      0         7.63
## 14  1982      1         5.23
## 15  1983      0         8.22
## 16  1983      1         5.49
## 17  1984      0         7.30
## 18  1984      1         5.22
ggplot(data.means2, aes(x = year, y = avg_wage_low, linetype = factor(treat2))) + geom_line(aes(linetype = factor(treat2))) + geom_point() + geom_vline(xintercept = 1980) + scale_linetype_discrete(labels = c("Comparison Cities", "Miami"))

The graph shows a sharp decrease in wages of low skill workers in Miami after the Mariel Boatlift.