This is a study of the effect of immigration on US wages. The study uses difference-in-differences to compare the wages in Miami, the treatment variable, and 4 control group cities before and after the Mariel Boatlift (mass immigration of Cubans to Miami in 1980). If immigration lowers wages, we should expect to see a decrease in wages in Miami relative to control group cities after the Mariel Boatlift.
library(readr)
library(ggplot2)
library(dplyr)
stat_assign1_data <- read.csv("/Users/jenniferlu/Desktop/R/data/stat_assign1_data.csv")
The data comes from the Current Population Survey (CPS) conducted by the Bureau of Census. The dataset consists of 80,456 observations from 1976 to 1984. There are 5 variables in the dataset:
The treatment group Miami before 1980 had an average wage of 10.50 per hour. The treatment group is going to be compared to four control cities: Minneapolis (10.58), Baltimore (10.17), Newark (10.61), and Rochester (10.50)
filtered_wages <- stat_assign1_data %>%
filter(metarea %in% c(0720, 5120, 5605, 6840, 5000))
summary(filtered_wages)
## year metarea age wage edcode
## Min. :1976 Min. : 720 Min. :25.00 Min. : 1.525 Min. :1.000
## 1st Qu.:1978 1st Qu.: 720 1st Qu.:31.00 1st Qu.: 6.642 1st Qu.:2.000
## Median :1980 Median :5120 Median :38.00 Median : 9.442 Median :2.000
## Mean :1980 Mean :4092 Mean :39.68 Mean :10.261 Mean :2.614
## 3rd Qu.:1982 3rd Qu.:5605 3rd Qu.:48.00 3rd Qu.:12.659 3rd Qu.:4.000
## Max. :1984 Max. :6840 Max. :59.00 Max. :39.774 Max. :4.000
create dummy variable post = 1 if year >= 1980, = 0 if year < 1980
filtered_wages$post = as.numeric(filtered_wages$year >= 1980)
create dummy variable treat = 1 if individual is from Miami, = 0 if not
filtered_wages$treat = as.numeric(filtered_wages$metarea == 5000)
post*treat variable in Difference in Difference
filtered_wages$DD = filtered_wages$treat * filtered_wages$post
regression: y_[it] = beta_0 + beta_1 (post_t*treat_i) + beta_2 post_t + beta_3 treat_i + epsilon_[it]
m1 <- lm(wage ~ DD + treat + factor(metarea) + factor(edcode) + factor(year), data = filtered_wages)
summary(m1)
##
## Call:
## lm(formula = wage ~ DD + treat + factor(metarea) + factor(edcode) +
## factor(year), data = filtered_wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.979 -3.309 -0.631 2.470 28.859
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.87147 0.22738 39.016 < 2e-16 ***
## DD 0.04550 0.36818 0.124 0.9017
## treat -1.36337 0.28703 -4.750 2.07e-06 ***
## factor(metarea)5000 NA NA NA NA
## factor(metarea)5120 -0.11375 0.15738 -0.723 0.4699
## factor(metarea)5605 0.15308 0.17017 0.900 0.3684
## factor(metarea)6840 0.03875 0.21071 0.184 0.8541
## factor(edcode)2 1.27736 0.17501 7.299 3.21e-13 ***
## factor(edcode)3 1.84197 0.20458 9.004 < 2e-16 ***
## factor(edcode)4 4.68472 0.18038 25.972 < 2e-16 ***
## factor(year)1977 0.13627 0.25067 0.544 0.5867
## factor(year)1978 -0.25569 0.25025 -1.022 0.3069
## factor(year)1979 -0.58864 0.25011 -2.354 0.0186 *
## factor(year)1980 -0.35451 0.25238 -1.405 0.1602
## factor(year)1981 -1.25498 0.25388 -4.943 7.85e-07 ***
## factor(year)1982 -1.35309 0.25501 -5.306 1.15e-07 ***
## factor(year)1983 -1.05906 0.25305 -4.185 2.88e-05 ***
## factor(year)1984 -1.21585 0.25129 -4.838 1.34e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.046 on 7362 degrees of freedom
## Multiple R-squared: 0.1188, Adjusted R-squared: 0.1169
## F-statistic: 62.02 on 16 and 7362 DF, p-value: < 2.2e-16
The value of DD was 0.04, which means that the Mariel Boatlift had a small positive effect on wages in Miami. The treat variable was -1.36, which means the wages of workers in Miami were 1.36 higher than workers in comparison cities.
the mean of wages
data.means <- filtered_wages %>%
group_by(year, treat) %>%
summarize(avg_wage = mean(wage))
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
data.means
## # A tibble: 18 x 3
## # Groups: year [9]
## year treat avg_wage
## <int> <dbl> <dbl>
## 1 1976 0 10.7
## 2 1976 1 9.82
## 3 1977 0 11.2
## 4 1977 1 9.11
## 5 1978 0 10.7
## 6 1978 1 9.26
## 7 1979 0 10.5
## 8 1979 1 8.85
## 9 1980 0 10.7
## 10 1980 1 9.27
## 11 1981 0 9.90
## 12 1981 1 8.13
## 13 1982 0 9.91
## 14 1982 1 7.38
## 15 1983 0 10.3
## 16 1983 1 8.30
## 17 1984 0 10.1
## 18 1984 1 9.90
ggplot(data.means, aes(x = year, y = avg_wage, linetype = factor(treat))) + geom_line() + geom_vline(xintercept = 1980) + scale_linetype_discrete(labels = c("Comparison Cities", "Miami"))
##Immigration on low skill-level workers
Test on low skill-level workers(edcode = 1) in Miami and comparison cities
filtered_wages$treat2 = ifelse(filtered_wages$edcode == 1, as.numeric(filtered_wages$metarea == 5000), NA)
post*treat variable in Difference in Difference
filtered_wages$DD_low = filtered_wages$treat2 * filtered_wages$post
regression: y_[it] = beta_0 + beta_1 (post_t*treat_i) + beta_2 post_t + beta_3 treat_i + epsilon_[it]
m1_low <- lm(wage ~ DD_low + treat2 + factor(metarea) + factor(year), data = filtered_wages)
summary(m1_low)
##
## Call:
## lm(formula = wage ~ DD_low + treat2 + factor(metarea) + factor(year),
## data = filtered_wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4753 -2.8050 -0.4536 2.1566 29.5234
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.64661 0.30826 28.050 < 2e-16 ***
## DD_low -0.28018 0.61611 -0.455 0.64936
## treat2 -1.21505 0.46296 -2.625 0.00878 **
## factor(metarea)5000 NA NA NA NA
## factor(metarea)5120 0.60693 0.33297 1.823 0.06858 .
## factor(metarea)5605 -0.11504 0.29890 -0.385 0.70040
## factor(metarea)6840 0.93791 0.41389 2.266 0.02362 *
## factor(year)1977 0.69298 0.41444 1.672 0.09476 .
## factor(year)1978 0.08936 0.43345 0.206 0.83670
## factor(year)1979 -0.32804 0.42502 -0.772 0.44037
## factor(year)1980 -0.61709 0.43889 -1.406 0.15996
## factor(year)1981 -1.42206 0.44592 -3.189 0.00146 **
## factor(year)1982 -1.30302 0.45321 -2.875 0.00411 **
## factor(year)1983 -0.72055 0.45509 -1.583 0.11360
## factor(year)1984 -1.57139 0.48041 -3.271 0.00110 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.871 on 1248 degrees of freedom
## (6117 observations deleted due to missingness)
## Multiple R-squared: 0.0617, Adjusted R-squared: 0.05193
## F-statistic: 6.313 on 13 and 1248 DF, p-value: 1.186e-11
The DD_low variable value is -0.28. This means that low-skill workers in Miami after 1980 experienced a decrease of 0.28 in hourly wage.
data.means2 <- filtered_wages %>%
group_by(year, treat2) %>%
summarize(avg_wage_low = mean(wage))
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
data.means2 <- na.omit(data.means2)
data.means2
## # A tibble: 18 x 3
## # Groups: year [9]
## year treat2 avg_wage_low
## <int> <dbl> <dbl>
## 1 1976 0 8.77
## 2 1976 1 7.74
## 3 1977 0 9.47
## 4 1977 1 8.27
## 5 1978 0 8.99
## 6 1978 1 7.16
## 7 1979 0 8.58
## 8 1979 1 6.92
## 9 1980 0 7.98
## 10 1980 1 7.95
## 11 1981 0 7.27
## 12 1981 1 6.43
## 13 1982 0 7.63
## 14 1982 1 5.23
## 15 1983 0 8.22
## 16 1983 1 5.49
## 17 1984 0 7.30
## 18 1984 1 5.22
ggplot(data.means2, aes(x = year, y = avg_wage_low, linetype = factor(treat2))) + geom_line(aes(linetype = factor(treat2))) + geom_point() + geom_vline(xintercept = 1980) + scale_linetype_discrete(labels = c("Comparison Cities", "Miami"))
The graph shows a sharp decrease in wages of low skill workers in Miami after the Mariel Boatlift.