For my final project, I looked at some historical data related to how the US handles refugees from around the world. Despite current events and political rhetoric, the US has a long history of welcoming refugees and asylum seekers into our country. Since 1988, the US has taken in about 511,948.7 refugees a year. If we take a closer look at the data, let’s see if we spot any trends.
The global generation of refugees seems pretty stable over time. Interestingly, the US had a pretty serious increase in the number of refugees we accepted under George W. Bush, but a sharp decline at the end of his tenure, and into President Obama’s. Clearing up the log scale, here’s a simple graph that shows what percentage of the world’s refugees we have taken in each year.
I doubt we treat all countries equally though. Given the debate over a seemingly small number of Syrian refugees (10,000 extra on 500,000), there must be some countries, similar to Syria from which it is difficult to reach the US, and others from which it is much easier. The following graph looks at countries that have had a large number of refugees, which the US has taken either a very high or a very low percentage of the overall refugee population.
Many opponents to expanding our refugee acceptance numbers argue that with our own homeless population and domestic economic issues, it is not prudent to accept more people who will ostensibly be a drain on our welfare system. But let’s examine that. The following graph shows the median income over time for both standard immigrant, and refugee populations.
The data indicates that both refugees and immigrants get on their feet pretty quickly, and approach the median income for US-born citizens after about 20 years or so. Based on this, there seems to be some validity to the notion that refugees are a short term drag on the economy. In their early years, they will require an investment in terms of welfare and in terms of job placement programs, english training, etc… On the other hand, they appear to be a source of quality human capital in the long run, becoming indiscernible from native-borns within a generation.
The data I used for this project can be found from the The UN Refugee Agency, and the American Community Survey Public Use Microdata.
My methodology for estimating the incomes of immigrant and refugee groups involved pooling data from the ACS from 2012, 2013, and 2014, and matching the survey data with the UN refugee data. For each year and country of origin tuple, I estimated the percent that were refugees by extrapolating the number of total immigrants from the ACS data and combining that with the known refugee population from the UN data.
I then modeled income with the following formula:
lmIncome = train(medianIncome ~ percentMale + percentEducated + percentBlack + percentVGEng + log(refugeeRatio + 1) + yearsInUSA +
log(yearsInUSA + 2) + medianAge + factor(placeOfBirth) + refugeeRatio*yearsInUSA,
data = mydata, method = 'glm')
I did not do any serious model testing, as I did not find it necessary for the scope of this project. Below is the predicted values of income vs the actual value for ~800 groups of country-year tuples. Note that I did not use a testing set, and am plotting against my training data, therefore again, this model must be taken with a grain of salt. It was a quick fit for a design oriented project.
qplot(predicted, mydata$medianIncome)