This project relies on data provided by Homeland Security. On their website, you can download the yearbook of immigration Statistics for the past 15 years. (https://www.dhs.gov/immigration-statistics/yearbook/2016)
The data is available in a PDF format as well as .xls tables.
I’ve chosen to focus this project on the characteristics of foreign nationals who are granted lawful permanent residence (i.e., immigrants who receive a “green card”) over the past 10 years.
I have downloaded and the following tables and inserted the data into my immigration dataset. Table 1 Persons Obtaining Lawful Permanent Resident Status: Fiscal Years 1820 To 2016 Table 3* Persons Obtaining Lawful Permanent Resident Status By Region And Country Of Birth: Fiscal Years 2007 To 2016 Table 8 Persons Obtaining Lawful Permanent Resident Status By Sex, Age, Marital Status, And Occupation: Fiscal Years 2007 To 2016
Note: Before importing the different tables into my dataset, I performed a bit of cleaning: -Removed the header and footnotes to facilitate the import -Transposed column & rows to have the year in column
Before diving deeper into the analysis, let’s take a look at our newly created dataset. Our immigration dataset contains the following variables:
## [1] "Year" "GCR"
## [3] "Africa" "Asia"
## [5] "Europe" "North.America"
## [7] "Oceania" "South.America"
## [9] "Unknown" "Female"
## [11] "Male" "Under.16.years"
## [13] "X16.to.20.years" "X21.years.and.over"
## [15] "Married" "Single"
## [17] "Widowed" "Divorced.Separated"
## [19] "Unknown_marital_status"
First, let’s take a look at the global evolution of immigration in the US over the last century. We can notice drops in the number of immigrants around the two world war, a steady increase after WW2, a spike in the 1990.
Now that we have an overview of the evolution of immigration in the US over the last century. We are going to focus our study on the last 10Y.
The dataset I used doesn’t
For the past ten years, immigration has been fairly stable, and as you can see in the graph below, most of the immigrants are coming from Asia and North America. Both of them represent on average 72.8% of the total immigration.
## Oceania South.America Asia North.America
## Min. :0.4597 Min. : 6.726 Min. :36.04 Min. :31.44
## 1st Qu.:0.4730 1st Qu.: 7.363 1st Qu.:38.12 1st Qu.:31.88
## Median :0.4981 Median : 8.139 Median :40.17 Median :32.26
## Mean :0.5012 Mean : 8.131 Mean :39.68 Mean :33.11
## 3rd Qu.:0.5138 3rd Qu.: 8.766 3rd Qu.:41.35 3rd Qu.:34.42
## Max. :0.5797 Max. :10.121 Max. :42.52 Max. :36.10
## Europe Africa
## Min. : 7.895 Min. : 8.999
## 1st Qu.: 7.978 1st Qu.: 9.571
## Median : 8.354 Median : 9.665
## Mean : 8.616 Mean : 9.821
## 3rd Qu.: 9.180 3rd Qu.: 9.873
## Max. :10.126 Max. :11.235
## [1] 72.792
The gender doesn’t seem to be a determinent factor when trying to paint the picture of green card recipients over the past 10 years. The total of men and women receiving it is pretty close and alternate from one year to the other.
Tip: Based on what you saw in the univariate plots, what relationships between variables might be interesting to look at in this section? Don’t limit yourself to relationships between a main output feature and one of the supporting variables. Try to look at relationships between supporting variables as well.
Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.
Tip: Now it’s time to put everything together. Based on what you found in the bivariate plots section, create a few multivariate plots to investigate more complex interactions between variables. Make sure that the plots that you create here are justified by the plots you explored in the previous section. If you plan on creating any mathematical models, this is the section where you will do that.
Tip: You’ve done a lot of exploration and have built up an understanding of the structure of and relationships between the variables in your dataset. Here, you will select three plots from all of your previous exploration to present here as a summary of some of your most interesting findings. Make sure that you have refined your selected plots for good titling, axis labels (with units), and good aesthetic choices (e.g. color, transparency). After each plot, make sure you justify why you chose each plot by describing what it shows.
Tip: Here’s the final step! Reflect on the exploration you performed and the insights you found. What were some of the struggles that you went through? What went well? What was surprising? Make sure you include an insight into future work that could be done with the dataset.
Tip: Don’t forget to remove this, and the other Tip sections before saving your final work and knitting the final report!