Abstract

Immigration levels and economic conditions are closely intertwined. In the United States many immigrants are motivated to visit or relocate to the United States by the desire to benefit from opportunities available in the U.S. labor market that they do not have access to in their country of origin. The current presidential administration is proposing immigration reforms focusing on merit-based admissions and enforcement of existing laws that it asserts will benefit the U.S. economy. Political opponents claim the proposed policies have racial undertones and are meant to reduce diversity in the United States. This paper conducts correlational and regression coefficient analyses to dissect and describe the underlying relationships between legislation on immigration, demographics, and the U.S. economy. Based on these empirics, a conclusion is then drawn on the likely outcome of the proposed immigration reforms.

Introduction

Overview

Since the United States passed its first immigration law in 1790,1 the U.S. Congress has passed over 200 other immigration laws and amendments an average of once every year.2 Legislation has been exclusionary in some cases such as the Chinese Exclusion Act of 1882, and inclusionary on other occasions such as the Chinese Exclusion Repeal Act of 1943. The most recent immigration law passed was the Javier Vega, Jr. Memorial Act of November 2, 2017. The law named a Texas Border Patrol checkpoint posthumously in honor of Border Patrol agent Javier Vega, Jr. According to the White House, the next round of immigration reforms will focus on enforcement, DACA legalization, sponsorship limitations, and visa restrictions.3 Even though the impact of immigration laws has varied over time, high-impact reforms like those proposed tend to bring racial and/or ethic intent into question. The intent behind the proposed reforms is currently a controversial topic which cannot be quantified, but it is quite clear that a change in direction is definitely happening. On February 23, 2018 the United States Citizenship and Immigration Services removed the phrase “nation of Immigrants” from its mission statement.

Former Mission Statement Current Mission Statement
USCIS secures America’s promise as a nation of Immigrants by providing accurate and useful information to our customers, granting immigration and citizenship benefits, promoting an awareness and understanding of citizenship, and ensuring the integrity of our immigration system.4 U.S. Citizenship and Immigration Services administers the nation’s lawful immigration system, safeguarding its integrity and promise by efficiently and fairly adjudicating requests for immigration benefits while protecting Americans, securing the homeland, and honoring our values.5

The fact that these changes are happening in the context of a growing number of movements in opposition to demographic changes in the United States has several news outlets conflating the proposed immigration reforms with these factions. Therefore, the significance of the proposed reforms is not just rooted in the economic impact to the United States and the global economy, but also on the intangible change in the United States’ identity from a ‘Nation of Immigrations’ to something less inclusionary. This research study will analyze the proposed immigration reforms in the context of the relationships between immigration laws, demographics, and the economy. There are empirical studies on these individual factors, but there are not many that look at the relationship between immigration laws, demographics, and the economy throughout history. For example, studies by the U.S. Census Bureau on historic demographics do not address immigration laws or the economy. The Migration Policy Institute6 has reports on immigration flows based on data from the U.S. Department of Homeland Security, but this information is also mostly demographic. There was a recent article published in the Washington Post forecasting the impact of the proposed immigration reforms on demographics and the economy, but it does not address anything prior to 2015. This research study looks at the relationship between immigration laws, demographics, and the economy at a very granular level.

Theory

In the current political climate, restrictions on immigration are rationalized as means to protect the average worker in the United States from the influx of low-cost labor. From an empirical and historical perspective however, the proposed restrictions on immigration are likely to be motivated by desires to reduce the rate of racial and ethnic diversity in United States. This research study will flesh out the empirics and history, but findings contradicting the theory that immigration negatively impacts wages has already been well documented by Kaufman and Hotchkiss (2006) who go into detail showing how this theory is not been fully supported by empirical findings.7 Standard economic theories about competitive markets posit that immigration increases the supply of labor which in turn decreases the cost of labor (wages). Yet Kaufman and Hotchkiss note that the impact of immigration on wages has been found to be either nonexistent or small; even when the influx is large and sudden. Kaufman and Hotchkiss explain how these contradictory findings led to explanations such as those claiming that immigrants are not perfect substitutes for native workers due to language. They also describe researchers examining the same data with techniques that were “sufficiently refined and sophisticated to capture the true [negative] effect of immigration on wages.” Nowadays such repeated measures, if not controlled for, would constitute data dredging. Although the impact of immigration on wages is debatable, Kaufman and Hotchkiss are unambiguous about the positive correlation between immigration and economic growth–economic booms and immigration go hand-in-hand.

Hypothesis

The frequency of legislation regarding immigration fluctuates when matters related to immigration are under review. Therefore, the number of laws being passed on immigration is a useful sentiment indicator for how the United States feels about past and future immigration. The hypothesis of this analysis is that there exists a feedback loop between the number of immigration laws enacted and the demographic composition of the United States. This feedback loop is a bidirectional relationship where outputs of one factor are inputs for another, and then the outputs of the latter factor are routed back as inputs to the former factor, in an ongoing cycle. Specifically, the number of immigration laws enacted in the future is a function of demographic changes–and demographic changes are a function of the number of immigration laws enacted in the past. It is further hypothesized that the positive correlation that Immigrants share with various sectors of the U.S. economy holds regardless of the immigrant’s country of origin. Therefore, the proposed immigration restrictions based on country of origin are more likely to impact the diversity rather than the economy of the United States.

Question

The current presidential administration is proposing immigration reforms focusing on merit-based admissions and enforcement of existing laws that it asserts will benefit the economy. Political opponents claim the proposed policies have racial undertones and are meant to reduce the diversity of Immigrants. How have immigration reforms impacted the economy and the demographics of the United States historically and what is the likely outcome of the proposed immigration reforms?

Background

Definitions

The U.S. Census Bureau defines Natives as natural born citizens and children born abroad to citizen parent(s). Foreign-born are defined as permanent residents, naturalized citizens, authorized migrants, and unauthorized migrants. Other Race is a less well-defined category that has been found to capture different racial and ethnic categories which survey respondents feel are not reflected in the defined categories. Some examples are Hispanics before a category was added, individuals who identify race with country of origin, and multiracial individuals. The U.S. Department of Homeland Security refers to Foreign-born as aliens and subdivides their population into Immigrants, Nonimmigrants, and Inadmissible. Immigrants are foreigners permanently relocating to the U.S. and Nonimmigrants are foreigners relocating to the U.S. temporarily.8 Foreigners who are Inadmissible are those “not permitted by law to enter or remain in the United States.” A visa is an authorization to enter the United States. “A citizen of a foreign country who seeks to enter the United States generally must first obtain a U.S. visa.”

Literature Review

The U.S. Census Bureau has a demographic study examining the Foreign-born population by country from 1850 to 2010.9 Interesting revelations in this study are the steady decline in the Foreign-born population from 1910 to 1970 and then steady increase thereafter. In the last period examined, 2010, the Foreign-born population is still not as high as the period between 1860 and 1910. The study breaks down Natives and Foreign-born by State, Country or Origin, education level, preferred language, participation in the labor force, occupation, income, and age.

An article published by the U.S. National Archives in 2002 by Senior Historian Marian Smith takes an in depth look at demographics and immigration law. Race, Nationality, and Reality goes into great detail about how immigration policy is underlined by a long history of racism where Immigrants were excluded based on race and ethnicity. Immigration and nationality were legislated separately in the early years of the United States. Early Immigration laws focused on keeping out the poor and sick. Early Nationality laws focused on granting “free white persons” who immigrated from Europe citizenship while excluding African-Americans and untaxed Native-Americans. Federal citizenship was not even seen as that important until after the Civil War. So, when immigration and naturalization were placed under the administration of one agency, the coordination and uniform application of different laws designed to restrict race (as defined by color) and geographic origin became a problem. Attempts to mitigate the confusion by focusing only on ethnicity exacerbated the problems. Citizenship was being granted and denied based on color, origin, or ethnicity depending on the individual case. Several pieces of legislation tried, but failed, to codify the “common understanding” of who was entitled to citizenship. Exceptions either for or against citizenship became commonplace. The lack of clarity and guidance was resulting in courts flooded with appeals. Marian states that “there was no settled opinion on what constituted evidence of race [and] as long as nationality law contained racial requirements for naturalization, and immigration law excluded those ineligible to naturalize,” there was no way to stop the problem. Then, in 1952, immigration and naturalization laws were finally combined into a more cogent form with racial requirements completely removed.10

Another study published in the Washington Post last on February 26, 2018 looks at how the cuts to legal immigration proposed by the current presidential administration will impact demographics. The report states that the " immigration plan could keep whites in U.S. majority for up to five more years…[by] greatly slashing the number of Hispanic and black African Immigrants entering the United States." This study looks at the proposed immigration policies in combination with population projections, fertility figures, and visa backlog information. It concludes by postulating that the proposed reforms will decrease the size of the population, which will therefore decrease the size of the workforce, which will therefore decrease the size of the economy.11

Primary Data Sources

U.S. Census Bureau

The U.S. Census Bureau has wealth of information on their website12 regarding the U.S. population going back as far as 1790.13 The data compiled on race and Hispanic origin consists of the Historical Census Statistics on Population Totals by Race and by Hispanic Origin14 which spans 1790 to 1990 on race and 1850 to 1990 on Hispanic origin. Hispanic origin is defined as Spanish mother tongue, culture, surname, or origin regardless of race. More recent observations of these data are available in the Profile of General Demographic Characteristics (DP-1) data tables from the 2000 Census15 and the 2010 Census.16 The most recent observations of these data, from 2011 to 2016, are found in the 1-year American Community Survey Demographic and Housing Estimates (DP5) data tables.

The data on Foreign-born population comes from tables in the Historical Census Statistics on the Foreign-born Population of the United States17 which spans 1850 to 2000. The Nativity of the Population18 table outlines the number of persons that are native and Foreign-born. The Region of Birth of the Foreign-born Population19 table breaks down the origin of the Foreign-born population into Europe, Asia, Africa, Oceania, Latin America, and Northern America. The details on which countries are sub-regions of these macro geographic regions is outlined on pages 46-47 of the United Nations 2002 Demographic Yearbook.20 The most recent observations of these data, from 2011 to 2016, are found in the 1-year Selected Characteristics of the Native and Foreign-born Populations (S0501) data tables.

The dataset compiled from all these Census data is quite impressive, but it is not yet ready for use. Several nuances must be addressed. Although analyzing the cause of these discrepancies is not a subject of this paper, it is worth noting that –while not surprising given the history of the hegemony in the United States–the discrepancies are rather unsettling. Notes from the metadata explain that:

  1. Although the Census includes data on the free and slave African-American population, “information on nativity was not collected for slaves.”
  2. “The attempt to enumerate all American Indians started in 1890” with earlier estimates being inaccurate because “American Indians not taxed (i.e., living in tribal society) were not included in the enumeration.”
  3. The 1940 Hispanic population figure only included the “White population of Spanish mother tongue”.
  4. In 1940 and 1950, there is Foreign-born data missing “because data on the Foreign-born population by country of birth in census publications for these years are limited almost entirely to the White population.”

To address the discrepancy caused by (1), the number of slaves disembarking in the United States21 is used as a proxy for Foreign-born Africans entering the United States from 1790 to 1880. To address (2), the Native-American population figures are backfilled using the percent change22 of the Native-American population starting from the earliest period without discrepancies which is 1890. Native born and total population figures are also adjusted to reflect the revised Native-American population figures. To address (3), the value was removed since it was not representative of all Hispanics. Then, to complete the dataset, imputation is used. The missing data outlined in (4) for the Foreign-born in 1940 and 1950 is filled in using linear interpolation between 1930 and 1960. Linear interpolation is also used to fill in missing annual Census data. This is because the Census data obtained is decennial from 1790 to 2010 and then annual until 2016; while all the other non-Census data are annual.

\[y=y_{ 1 }+\left( x-x_{ 1 } \right) \left( \frac { y_{ 2 }-y_{ 1 } }{ x_{ 2 }-x_{ 1 } } \right)\]

Linear interpolation23 makes no assumption on distribution. It draws a straight line between two known points and returns a value along that line for a given point in the domain. The remainder of the missing data is filled in using linear regression based on the natural log of the total population and the natural log of the variable being imputed.

\[\hat { y } ={ e }^{ \left( \beta _{ 0 }+\ln { \left( x_{ i } \right) } \beta _{ 1 } \right) }\]

The exceptions to the imputation via interpolation and regression are the statistics on Foreign-born persons from Europe and North America. Linear regression is not suited for these nonlinear data. Instead, imputation for other regions are subtracted from the total Foreign-born population and distributed according to weights based on the historical averages for the two regions. The resulting dataset is a complete dataset24 from 1790 to 2016 on race, Hispanic origin, nativity, and country of origin for Foreign-born.

U.S. Department of State

Immigrant visas are issued to foreigners permanently relocating to the U.S. and Nonimmigrant visas are issued to foreigners relocating to the U.S. temporarily. Data regarding visas is available both from the U.S. Department of Homeland Security25 and the U.S. Department of State.26 The data27 from the U.S. Department of State shows every Nonimmigrant visa type28 by country from 1997 to 2017. This dataset has an extraordinary amount of detail, but only on Nonimmigrant visas issued by consular offices which is only a fraction of the Nonimmigrant visas. Therefore, these data are not suited for this analysis.

U.S. Department of Labor

Certain Immigrant and Nonimmigrant employment related visas require approval from the U.S. Department of Labor. Employers “must obtain foreign labor certification from the U.S. Department of Labor, prior to filing a petition” for these workers. The process of obtaining certification from the U.S. Department of Labor involves substantiating29 that “there are not sufficient U.S. workers able, willing, qualified and available to accept the job opportunity in the area of intended employment and that employment of the foreign worker will not adversely affect the wages and working conditions of similarly employed U.S. workers.” The types of employment related visas subject to this certification process are:

  • E3: Australian Free Trade Agreement principal’s family.
  • H1B: Temporary workers in specialty occupations.
  • H1B1: Chile and Singapore Free Trade Agreement aliens.
  • H2A/H2B: Agricultural and Nonagricultural workers.
  • D1: Crewman on a marine vessel or aircraft.

The data30 maintained by the U.S. Department of Labor on these employment related visas lists all information, including status, on every application from 2008 to 2017. There also exists an external source31 that has harvested data dating back to 2000. Although the U.S. Department of Labor data has an extraordinary amount of detail, it only encompasses a small number of the available employment related visas.32

U.S. Department of Homeland Security

The U.S. Citizenship and Immigration Services division of the U.S. Department of Homeland Security has compiled the most complete list of immigration legislation available for the period between 1790 and 1996. The list is not currently available from the source33 but it continues to exist in secondary sources such as the Immigration Daily34 website and the book Immigration, Assimilation, and Border Security by Yoku Shaw-Taylor from 2012. The list is fairly exhaustive, but it is missing about eight pieces of legislation from a smaller list35 maintained by the U.S. National Archives. Legislation from 1997 on can be found by querying a database36 maintained by the U.S. Congress. The database contains legislation since the 93rd Congress in 1973. The legislation on immigration that has become law since the 105th Congress in 1997 from this database, when combined with the information from the U.S. Citizenship and Immigration Services and U.S. National Archives, produces complete list37 of immigration legislation from 1790 to 2017.

The U.S. Citizenship and Immigration Services division also maintains data on immigration application forms by type (Naturalization, Employment, Family, Humanitarian, Other) and approval status from 2012 to 2017. Data on I-129 applications used for employment are available since 2009. Combining all years of available data from the U.S. Citizenship and Immigration Services results in an unremarkable dataset.38 The most useful data39 on immigration relevant to this research are the statistics produced by the U.S. Department of Homeland Security in their annual yearbooks. The data on Immigrant visas form their Persons Obtaining Lawful Permanent Resident Status by Type and Major Class Of Admission40 tables spans 1986 to 2016. The data on Nonimmigrant visas from their Nonimmigrant Admissions by Class of Admission41 tables spans 1981 to 2016 with aggregate data on Immigrant visas since 1820. There are also data on Aliens Apprehended42 from 1925 to 2016, as well as on Aliens Removed or Returned43 from 1892 To 2016. Apprehensions are foreigner arrests, Removals are compulsory based on a court order, and Returns represent immigrants stopped (without a court order) from entering at the borders. From these data, a subset of the below Immigrant and Nonimmigrant employment-related visas has been carved out for additional analysis:

  • CW1/CW2: CNMI-only transitional workers and family.
  • E1, E2, E2C: Treaty traders or investors and family.
  • E3: Australian Free Trade Agreement principal’s family.
  • H1B: Temporary workers in specialty occupations.
  • H1B1: Chile and Singapore Free Trade Agreement aliens.
  • H1C: Registered nurses participating in the Nursing Relief for Disadvantaged Areas.
  • H2A/H2B/H2R: Agricultural/Nonagricultural/Returning workers.
  • H3/H4: Trainees and Spouses and children of H1, H2, or H3.
  • I1: Representatives of foreign information media and family.
  • L1/L2: Intracompany transferees and family.
  • O1/O2/O3: Workers with extraordinary ability, assistants, and family.
  • P1/P2/P3/P4: Internationally recognized athletes, entertainers, or artists, and family.
  • Q1/Q2/Q3: Workers in international cultural exchange programs and family.
  • R1/R2: Workers in religious occupations and family.
  • TN/TD: NAFTA professional workers and family.

The full join of these Immigrant and Nonimmigrant visa data are made uniform across their combined time domain by imputing missing years. Nonimmigrant data are imputed for 1982-1984, 1986-1988, and 1997 using linear interpolation between years where the data are present: [[1981, 1985], [1985, 1989], [1996, 1998]]. The missing Immigrant data from 1981 to 1985 is filled in by first imputing total visas issued using the historic average proportion of Nonimmigrant visas relative to total visas, then by subtracting Nonimmigrant visas by the derived number of total visas. The resulting dataset44 has complete cases on employment and total visas for both Immigrants and Nonimmigrants from 1981 to 2016. The dataset also includes Removals since 1892, Apprehensions since 1925, and Returns since 1927. No imputation was needed for these three enforcement actions since there are no missing data during the 1981 to 2016 period.

U.S. Bureau of Economic Analysis

The U.S. Bureau of Economic Analysis has data on Gross Output by Industry45 from 1987 to 2016. The data are very granular and consists almost entirely of complete cases. The exceptions are from 1987 to 1996 in the subcategories of (1) Housing, (2) Other Real Estate, (3) General Government National Defense, (4) General Government Nondefense, and (5) Nursing and Residential Care Facilities. For the (1), (2), (3), and (4) subcategories, imputations are done using the average proportion of the child node under the parent node from the data that are present. This imputation is relatively easy since these subcategories are the only child nodes of Real Estate and General Government. The imputation for (5) is done using the average proportion of the sum of its sibling nodes which were Ambulatory Health Care Services and Hospitals. The two existing sibling nodes that the imputations are based on, are then adjusted downward to reflect the new sibling node under the Health Care parent. Failing to adjust them downward would inflate Health Care. The adjustment of the existing sibling nodes is based on the average proportion of each node relative to each other in the existing data. The data also require format changes as final step. The format they are provided in is great for display purposes, it is not amenable to analysis. The final compilation is a complete dataset46 from 1987 to 2016 that breaks down GDP by sector: Agriculture, Mining, Utilities, Construction, Manufacturing, Wholesale, Retail, Transportation, Information, FIRE, Service, Education, Entertainment, Other, Federal, State and Local.

Data Characteristics

The dataset47 constructed from the arduous data gathering and cleaning process described above consists of Immigration Laws and Census data from 1790 to 2016, employment visa data from 1981 to 2016, enforcement data from 1925 to 2016, and GDP by Industry from 1987 to 2016. Correlational research is being conducted on the dataset to describe relationships between immigration law, demographics, employment visas, and the economy. The significance of the relationships between variables is evaluated using formal statistical tests.

github <- "https://raw.githubusercontent.com/jzuniga123"
file <- "/SPS/master/DATA%20698/DATA698_Project.csv"
data698 <- read.csv(paste0(github, file), header=1)
data698 <- data698[order(data698$Year), ]
data698 <- ts(data698[, -1], frequency=1, 
  start=min(data698[, 1]), end=max(data698[, 1]))

Legislation

Laws <- data698[, 1]

Historic immigration law numbers have an interesting pattern. The period between 1790 and 1931 saw an average of 0.303 pieces legislation per decade. The relative frequency of legislation did increase around the time of Mexican-American War in 1846, but the number of laws passed continued to remain relatively low. Then in 1932, in the midst the Great Depression, the number and frequency immigration laws started to steadily increase. Since 1932 there have been an average of 2.059 immigration laws being passed every year. Looking only at the period since 1997 shows an average of 3.65 immigration laws passed annually. The last 21 years have seen legislation on immigration being enacted at a frequency over 10 times greater than what was witnessed during the first 141 years of Federal legislative action on immigration.

ggplot() + ggtitle("Legislation on Immigration (1790-2016)") + 
  geom_line(aes(y = Laws, x = time(Laws)), stat="identity", col="blue") +
  xlab("Year") + theme(plot.title = element_text(hjust = 0.5))

Notable periods of increased legislative activity that were outliers in their era are:

  • 1798: Three laws passed in response to the Quasi-War with France.
  • 1882: Two laws passed, one being the Chinese Exclusion Act.
  • 1887: Two laws passed restricting labor contracts and property rights.
  • 1903: Two laws passed, one restricting anarchists.
  • 1932: Four laws passed, three restricting labor contracts.
  • 1940: Four laws passed, three enhancing enforcement activities.
  • 1950: Four laws passed, three providing relief to orphans, military, and sheepherders.
  • 1978: Seven laws passed, two restrictive to agriculture and travel, five progressive: addressing bias in admissions, expulsion of Nazis, human smuggling, and adoption.
  • 2000: Thirteen laws passed during an election year, eleven prior to elections.
  • 2002: Ten laws passed in the wake of the September 11, 2001 terrorists attacks.
  • 2008: Six laws passed during an election year, all prior to elections.

Demographics

Census <- data698[, 2:17]

Race and Hispanic Origin (RAHO)

RAHO <- Census[, 2:8]
RAHO[, 2] <- RAHO[, 2] + RAHO[, 3]
colnames(RAHO)[2] <- gsub("_.*", "", colnames(RAHO)[2])
RAHO <- RAHO[, -3]

The purely numerical data from the decennial Census on race and Hispanic origin has two interesting features. As is usual with most social phenomena, the distributions appear mostly exponential. The two exceptions are the White race which can be described as linear since about 1870 and the Native-American race which has a positive parabolic shape with a minimum at the year 1900. For the purposes of aggregate analysis, the free and enslaved African-American population that the Census counts separately until 1860 is being counted together.

numb_df <- data.frame(na.approx(RAHO), Year=time(RAHO))
plot_df <- gather(numb_df, "Demographic", "Population", 1:(ncol(numb_df)-1))
ggplot(plot_df, aes(x=Year, y=Population)) + geom_line() + 
  facet_wrap(~ Demographic, scales="free_y") + theme_bw() +
  ggtitle("Race and Hispanic Origin (1790-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Population, fill= Demographic)) +
  ggtitle("Race and Hispanic Origin (1790-2016)") + 
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

When the Census data on race and Hispanic origin are viewed as a percentage of total population however, some striking details emerge. Hispanics, Asians, Pacific Islanders, and Other races grow exponentially as a percentage of total population over time. Native-Americans have decreased exponentially as a percentage of total population over time, from 14.2% in 1790 to 0.7% in 2016. The African-American percentage of the population has decreased overall but in a cubic manner with a minimum of 9.5% in 1930 and two local maxima of 17.7% in 1810 and 11% in 1990. The White percentage of the population has a negative parabolic shape with a maximum of 89% in 1920 and an all time low of 61.7% in 2016. In aggregate, many of these individual characteristics seem less remarkable, but a demographic shift is clear.

Population <- rowSums(RAHO)
Proportions <- RAHO / Population
prop_df <- data.frame(na.approx(Proportions), Year=time(Proportions))
plot_df <- gather(prop_df, "Demographic", "Population", 1:(ncol(prop_df)-1))
ggplot(plot_df, aes(x=Year, y=Population)) + geom_line() + 
  facet_wrap(~ Demographic, scales="free_y") + theme_bw() +
  ggtitle("Race and Hispanic Origin Proportions (1790-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Population, fill= Demographic)) +
  ggtitle("Race and Hispanic Origin Proportions (1790-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

Native and Foreign-born (NAFB)

NAFB <- Census[, 9:10]

The decennial numbers on the Native and Foreign-born population also follow what can be described as an exponential pattern. There was a decline in the Foreign-born population from 1940 to 1970, but by 1980 the number returns to its prior levels and continues on its prior trajectory. When aggregated, the fluctuation in the Foreign-born population is much less noticeable.

numb_df <- data.frame(na.approx(NAFB), Year=time(NAFB))
plot_df <- gather(numb_df, "Demographic", "Population", 1:(ncol(numb_df)-1))
ggplot(plot_df, aes(x=Year, y=Population)) + geom_line() + 
  facet_wrap(~ Demographic, scales="free_y", ncol = 1) + theme_bw() +
  ggtitle("Native and Foreign-born (1790-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Population, fill= Demographic)) + 
  ggtitle("Native and Foreign-born (1790-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

The proportion of Native and Foreign-born persons in the population are horizontal reflections (mirror images over the \(x\)-axis) of each other. The median proportion of Foreign-born persons in the United States since 1790 has been 12.4%. In 1850 and 1970 the percentage of Foreign-born persons in the United States reached local minmas of 9.5% and 4.8%, respectively. The proportion of Foreign-born persons in the United States has peaks in 1870, 1890, and 1910 at 14.3%, 14.8%, and 14.7%; respectively. In 2016 the percentage of Foreign-born sits at 13.5%. When viewed in aggregate, the decline in the percentage of Foreign-born persons in the United States in 1970 appears to be significant relative to its proportion over time.

Population <- rowSums(NAFB)
Proportions <- NAFB / Population
prop_df <- data.frame(na.approx(Proportions), Year=time(Proportions))
plot_df <- gather(prop_df, "Demographic", "Population", 1:(ncol(prop_df)-1))
ggplot(plot_df, aes(x=Year, y=Population)) + geom_line() + 
  facet_wrap(~ Demographic, scales="free_y", ncol = 1) + theme_bw() +
  ggtitle("Native and Foreign-born Proportions (1790-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Population, fill= Demographic)) +
  ggtitle("Native and Foreign-born Proportions (1790-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

Foreign-born Country of Origin (FBCO)

FBCO <- Census[, 11:16]

The decennial numbers on country of origin for Foreign-born are a little more interesting. Persons from Africa, Asia, Oceania, and Latin America arrive at an exponential rate. Arrivals from Europe and North America however, show exponential growth until around 1930, and then steady declining thereafter. When aggregated, these data have a very interesting characteristic. There is a large hump in the number of Foreign-born Europeans that begins to deflate when race stopped being a factor in admissions around 1950.

numb_df <- data.frame(na.approx(FBCO), Year=time(FBCO))
plot_df <- gather(numb_df, "Demographic", "Population", 1:(ncol(numb_df)-1))
ggplot(plot_df, aes(x=Year, y=Population)) + geom_line() + 
  facet_wrap(~ Demographic, scales="free_y") + theme_bw() +
  ggtitle("Foreign-born Country of Origin (1790-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Population, fill= Demographic)) +
  ggtitle("Foreign-born Country of Origin (1790-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

The proportion of Foreign-born persons in the United States by country of origin shows exponential increases from Asia, Oceania, and Latin America just like the pure numbers. The large increases start around 1860 for Oceania, and 1970 for Latin America and Asia. Foreign-born persons from Africa follow a less natural pattern that coincides with slavery. The percentage of Foreign-born persons from Africa was high from 1790 to 1820, but relatively low thereafter. Looking at Foreign-born persons from Africa in the post-slavery era shows a more natural exponential pattern with a large increase starting in 1980. The proportion of Foreign-born persons from Europe remained above 80% until 1950 when a permanent and rapid decline started, leading to its current low of 10.9% in 2016. The proportion of Foreign-born persons from North America has a very strange and unnatural pattern hovering around 8.9% until 1970 when a rapid decline started, leading to its current low of 1.8% in 2016. The rapid decline of Europeans in 1950 and the rapid decline by North Americans starting 1970 are temporally proximate to when race stopped being a factor in admissions around 1950. When aggregated, the fall from dominance by Foreign-born Europeans and the rise to dominance by Latin Americans and Asians over time is quite remarkable.

Population <- rowSums(FBCO)
Proportions <- FBCO / Population
prop_df <- data.frame(na.approx(Proportions), Year=time(Proportions))
plot_df <- gather(prop_df, "Demographic", "Population", 1:(ncol(prop_df)-1))
ggplot(plot_df, aes(x=Year, y=Population)) + geom_line() + 
  facet_wrap(~ Demographic, scales="free_y") + theme_bw() +
  ggtitle("Foreign-born Country of Origin Proportions (1790-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Population, fill= Demographic)) +
  ggtitle("Foreign-born Country of Origin Proportions (1790-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

Immigration

Visas

Visas <- data698[, 18:21]
Visas[, 2] <- Visas[, 2] - Visas[, 1]
Visas[, 4] <- Visas[, 4] - Visas[, 3]
colnames(Visas)[c(2,4)] <- c("IV_NonEmp", "NIV_NonEmp")

The numeric data on Immigrant Visas since 1820 is best described as chaotic. It has wild swings up and down starting with an upward trend from 1820 to about 1910, followed by a drastic decline until about 1945, and then another trend upward until 2016 with a trajectory similar to that of the first upward trend. The graph also shows a huge spike around 1990.

IV_All <- na.omit(data698[, 19])
numb_df <- data.frame(IV_All, Year=time(IV_All))
ggplot() + ggtitle("Immigrant Visas (1820-2016)") + xlab("Year") + 
  geom_line(aes(y = IV_All, x = time(IV_All)), stat="identity", col="blue") +
  ylab("Amount") + theme(plot.title = element_text(hjust = 0.5))

Visa data since 1981 has been less chaotic. Employment related Immigrant Visas over the last 30 years are not as noisy. The number of Immigrant visas related to employment held steady from 1981 to 1991, was chaotic from 1992 to 2007, and then steady ever since. There were large decreases around 1999 and 2004, and large increases around 1993, 2001, and 2006. Immigrant Visas not related to employment follow a steadier pattern with the exception being the huge spike around 1990 mentioned earlier. Employment related Nonimmigrant Visas have a less remarkable pattern. Employment related Nonimmigrant Visas have a steadily increasing trend with only a slight amount of variability. Nonimmigrant Visas not related to employment have an even steadier increasing trend with less variability. When viewed in aggregate, the variability in most Visa types becomes mostly unnoticeable. This is because in the aggregated Visa figures, Nonimmigrant Visas not related to employment make up over 90% of the Visas issued thought the period examined. These Nonimmigrant Visas are temporary Visas for tourists, diplomats, athletes, and other temporary visitors.

Visa_Types <- window(Visas, start = 1981)
numb_df <- data.frame(Visa_Types, Year=time(Visa_Types))
plot_df <- gather(numb_df, "Type", "Visas", 1:(ncol(numb_df)-1))
ggplot(plot_df, aes(x=Year, y=Visas)) + geom_line() + 
  facet_wrap(~ Type, scales="free_y") + theme_bw() +
  ggtitle("Immigrant and Nonimmigrant Visas by Type (1981-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Visas, fill= Type)) +
  ggtitle("Immigrant and Nonimmigrant Visas by Type (1981-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

Enforcement

Enforce <- data698[, 22:24]

Enforcement data has some interesting patterns. Removals remained at a constant rate until 1996 when they increased substantially. Apprehensions and Returns follow almost identical patterns. There was a large increase in enforcement in the ten-year period between 1945 and 1955 with its peak being reached in 1954. The increase started again in 1965 and has remained elevated ever since with peaks in 1986, 1996, and 2000. Around the time Removals started steadily increasing, Apprehensions and Returns started steadily decreasing. When aggregated, the three enforcement actions follow the pattern of the latter two enforcement actions which make up the bulk of the actions. Another characteristic that emerges in these data when aggregated is the recent decline in Apprehensions and Returns and the recent increase in Removals.

Enforce_Window <- window(Enforce, start = 1925)
numb_df <- data.frame(Enforce_Window, Year=time(Enforce_Window))
plot_df <- gather(numb_df, "Method", "Actions", 1:(ncol(numb_df)-1))
ggplot(plot_df, aes(x=Year, y=Actions)) + geom_line() + 
  facet_wrap(~ Method, scales="free_y", ncol = 1) + theme_bw() +
  ggtitle("Enforcement by Method (1981-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Actions, fill= Method)) +
  ggtitle("Enforcement by Method (1981-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma) + 
  theme(legend.position="bottom")

GDP by Industry

Economy <- data698[, 26:41]

GDP by industry over the past 30 years has the same mostly homoskedastic upward trend for nearly every industry. The exceptions are Agriculture, Mining, Utilities, and Construction which appear to be more susceptible to business cycles due to their heteroskedasticity. The homoscedastic upward trend dominates aggregated income with the exception of the period encompassing the Great Recession.

Economy_Window <- window(Economy, start = 1987)
numb_df <- data.frame(Economy_Window, Year=time(Economy_Window))
plot_df <- gather(numb_df, "Sector", "Income", 1:(ncol(numb_df)-1))
ggplot(plot_df, aes(x=Year, y=Income)) + geom_line() + 
  facet_wrap(~ Sector, scales="free_y") + theme_bw() +
  ggtitle("GDP by Sector (1987-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Income, fill= Sector)) +
  ggtitle("GDP by Sector (1987-2016)") + 
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

When examining the industries as a proportion of the economy, their individual shares do fluctuate slightly but in aggregate their relative shares remain mostly constant over the last 30 years. Proportionally, the bulk of GDP by order of share is maintained by the Manufacturing, FIRE (Finance, Insurance, and Real Estate), Service, and Education sectors. Over time there has been a slight decrease in the proportion of the economy captured by the Manufacturing sector and slight increases in the FIRE and Service sectors.

GDP <- rowSums(Economy_Window)
Proportions <- Economy_Window / GDP
prop_df <- data.frame(na.approx(Proportions), Year=time(Proportions))
plot_df <- gather(prop_df, "Sector", "Income", 1:(ncol(prop_df)-1))
ggplot(plot_df, aes(x=Year, y=Income)) + geom_line() + 
  facet_wrap(~ Sector, scales="free_y") + theme_bw() +
  ggtitle("GDP by Sector Proportions (1987-2016)") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

ggplot(plot_df, aes(x=Year, y=Income, fill= Sector)) +
  ggtitle("GDP by Sector Proportions (1987-2016)") +
  geom_area(alpha = 0.75) + xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

Results

Correlation Analysis

Pearson (\(r\)) correlation measures the strength of relationships between linear variables. This analysis involves nonlinear variables therefore the metric used is Kendall48 correlation (\(\tau\)) which is a non-parametric measure of the strength of relationships between nonlinear variables. These correlations are then tested for statistical significance. That is, is the existence of the correlations significantly different from zero given approximately normal variation (and covariance) in the data.

\[r = \frac{\sum{(x-m_x)(y-m_y)}}{\sqrt{\sum{(x-m_x)^2}\sum{(y-m_y)^2}}} \tag{Pearson}\]\[\tau = \frac{n_c - n_d}{\frac{1}{2}n(n-1)} \tag{Kendall}\]

Bidirectional Impact of Policy Lag

There is a lag between the enactment of a law and its full implementation. Changes in laws necessitate the adjustment of institutional processes which are mired by the rules and procedures guiding the bureaucracies implementing the laws. It is worth noting that bureaucracies have been maligned with the negative connation behind red tape, but the bureaucratic adherence to rules is a feature–and not a bug– that hedges against chaos. The point is that there is no value judgment being made here. It is simply true that Immigration applications are processed on an ongoing basis while laws change and agencies readjust. Applications in progress must be managed around changing immigration laws. When looking at the relationship between legislation and other variables, it is therefore important to also look at the lagged relationships; and not just in one direction. When laws are reactive rather than proactive, there is also a lag between the motivation for a law and its enactment.

Legislation <- na.approx(window(cbind(
  Past_Laws_2yr=stats::lag(Laws, k=2), 
  Past_Laws_1yr=stats::lag(Laws, k=1),
  Current_Laws=Laws, 
  Future_Laws_1yr=stats::lag(Laws, k=-1),
  Future_Laws_2yr=stats::lag(Laws, k=-2)), 
  start=1790, end=2016), rule = 2)

Legislation and Demographic Factors

First the Kendall correlation between the number of new, one- and two-year old, and one- and two-year out immigration laws and demographic factors is examined. When it comes to population based on Race and Hispanic Origin, legislation has positive correlations between \(\tau=0.49\) and \(\tau=0.56\) for Hispanic origin and nearly every race. The strongest of these correlations are with the number of laws passed one year in the future. The exception to this pattern is Native-American population which has positive correlations that hover around \(\tau=0.25\), the strongest of which is with laws passed two years in the future. All of these correlations are significant at \(\alpha=0.05\). Increases in racial and ethnic subsections of the population are associated with future increases in the number of immigration laws. The strongest correlations are with Other Races then Asian and Pacific Islanders.

r <- corr.test(na.approx(RAHO), Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(na.approx(RAHO), Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Laws with Race and Hispanic Origin") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Race and Hispanic Origin
Correlation, Significant
Past Laws 2yr Past Laws 1yr Current Laws Future Laws 1yr Future Laws 2yr
White 0.493, Yes 0.504, Yes 0.504, Yes 0.505, Yes 0.498, Yes
Black 0.493, Yes 0.504, Yes 0.504, Yes 0.505, Yes 0.498, Yes
Native Am 0.225, Yes 0.236, Yes 0.255, Yes 0.272, Yes 0.282, Yes
Asian Pac 0.517, Yes 0.527, Yes 0.530, Yes 0.534, Yes 0.530, Yes
Other Race 0.513, Yes 0.529, Yes 0.543, Yes 0.560, Yes 0.555, Yes
Hispanic 0.493, Yes 0.504, Yes 0.504, Yes 0.505, Yes 0.498, Yes

For the Native and Foreign-born population, the correlations with the number of immigration laws passed are also moderate and statistically significant at \(\alpha=0.05\). Natives have positive correlations that hover around \(\tau=0.5\). Foreign-born have positive correlations that hover around \(\tau=0.43\). The strongest of these correlations are with the number of laws passed one year in the future. Increases in the native and foreign-born subsections of the population are associated with future increases in the number of immigration laws.

r <- corr.test(na.approx(NAFB), Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(na.approx(NAFB), Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Laws with Native and Foreign-born") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Native and Foreign-born
Correlation, Significant
Past Laws 2yr Past Laws 1yr Current Laws Future Laws 1yr Future Laws 2yr
Native Born 0.493, Yes 0.504, Yes 0.504, Yes 0.505, Yes 0.498, Yes
Foreign Born 0.424, Yes 0.433, Yes 0.434, Yes 0.434, Yes 0.425, Yes

The correlations with the number of immigration laws passed and Foreign-born Country of Origin are higher for Asia, Oceania, and Latin America which have correlations between \(\tau=0.49\) and \(\tau=0.51\). The strongest of these correlations are with the number of laws passed one year in the future. The correlations for Africa are around \(\tau=0.38\) and exhibit the strongest relationship with laws passed one year in the future. The same pattern holds for all these demographic factors and laws passed one year prior. The correlations of Europe and North America with immigration laws are between \(\tau=0.18\) and \(\tau=0.27\) with Europe on the lower end. The strongest of these correlations are associated with the number of laws passed two years earlier. The same pattern holds for these two demographic factors and laws passed two years prior. All of these correlations are significant at \(\alpha=0.05\). Increases in the foreign-born population from various regions are associated with future increases in the number of immigration laws; and the number of immigration laws passed is associated with future increases in the foreign-born population from various regions. There exists a bidirectional relationship between immigration laws and demographics.

r <- corr.test(na.approx(FBCO), Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(na.approx(FBCO), Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Laws with Foreign-born Country of Origin") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Foreign-born Country of Origin
Correlation, Significant
Past Laws 2yr Past Laws 1yr Current Laws Future Laws 1yr Future Laws 2yr
Europe 0.221, Yes 0.218, Yes 0.206, Yes 0.195, Yes 0.182, Yes
Asia 0.493, Yes 0.504, Yes 0.504, Yes 0.505, Yes 0.498, Yes
Africa 0.353, Yes 0.367, Yes 0.383, Yes 0.398, Yes 0.404, Yes
Oceania 0.495, Yes 0.506, Yes 0.507, Yes 0.507, Yes 0.501, Yes
Latin Am 0.493, Yes 0.504, Yes 0.504, Yes 0.505, Yes 0.498, Yes
North Am 0.267, Yes 0.267, Yes 0.258, Yes 0.251, Yes 0.237, Yes

Legislation and Economic Factors

The Kendall correlation between the number of new, one-year old, two-year old, and three-year old immigration laws and economic factors is examined next. The correlations are higher for employment related Visas than they are for non-employment related Visas. Although these correlations are larger, they are not very high. The correlations for employment related Visas are between \(\tau=0.115\) and \(\tau=0.315\), the largest correlations found two years after immigration laws are passed. Yet none of these correlations are significant at \(\alpha=0.05\).

r <- corr.test(Visas, Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(Visas, Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Laws with Immigration Visas") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Immigration Visas
Correlation, Significant
Past Laws 2yr Past Laws 1yr Current Laws Future Laws 1yr Future Laws 2yr
IV Emp 0.126, No 0.167, No 0.160, No 0.163, No 0.315, No
IV NonEmp 0.113, No 0.091, No 0.115, No 0.062, No 0.194, No
NIV Emp 0.115, No 0.150, No 0.189, No 0.237, No 0.305, No
NIV NonEmp 0.106, No 0.078, No 0.133, No 0.052, No 0.170, No

Enforcement actions have low correlations with the number of immigration laws passed. The correlations are fairly constant across lags ranging between \(\tau=0.167\) and \(\tau=0.238\). None of these correlations are significant at \(\alpha=0.05\).

r <- corr.test(Enforce, Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(Enforce, Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Laws with Immigration Enforcement Actions") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Immigration Enforcement Actions
Correlation, Significant
Past Laws 2yr Past Laws 1yr Current Laws Future Laws 1yr Future Laws 2yr
Removed 0.167, No 0.191, No 0.209, No 0.223, No 0.214, No
Apprehended 0.199, No 0.184, No 0.207, No 0.213, No 0.238, No
Returned 0.220, No 0.195, No 0.215, No 0.205, No 0.215, No

The largest absolute correlations with the number of immigration laws passed and GDP by Industry can be found two years after immigration laws are passed. Just like with Visas however, although these correlations are larger, they are not very high. The correlations between immigration laws and GDP by Industry range between \(\tau=|0.02|\) and \(\tau=|0.305|\); and none significant at \(\alpha=0.05\).

r <- corr.test(Economy, Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(Economy, Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Laws with the Economy") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with the Economy
Correlation, Significant
Past Laws 2yr Past Laws 1yr Current Laws Future Laws 1yr Future Laws 2yr
Agriculture -0.067, No 0.018, No 0.086, No 0.160, No 0.199, No
Mining -0.037, No 0.073, No 0.232, No 0.201, No 0.270, No
Utilities 0.017, No 0.108, No 0.116, No 0.191, No 0.285, No
Construction -0.002, No 0.013, No 0.076, No 0.181, No 0.260, No
Manufacturing -0.057, No 0.013, No 0.136, No 0.155, No 0.290, No
Wholesale -0.082, No -0.028, No 0.101, No 0.150, No 0.290, No
Retail -0.077, No -0.028, No 0.091, No 0.145, No 0.295, No
Transportation -0.077, No -0.028, No 0.121, No 0.165, No 0.280, No
Information -0.087, No -0.023, No 0.096, No 0.160, No 0.290, No
FIRE -0.082, No -0.013, No 0.096, No 0.160, No 0.270, No
Service -0.087, No -0.023, No 0.101, No 0.160, No 0.285, No
Education -0.092, No -0.023, No 0.091, No 0.170, No 0.290, No
Entertainment -0.087, No -0.023, No 0.101, No 0.160, No 0.285, No
Other Industries -0.082, No -0.023, No 0.096, No 0.165, No 0.285, No
Govt Federal -0.027, No 0.013, No 0.131, No 0.186, No 0.305, No
Govt StateLoc -0.092, No -0.023, No 0.091, No 0.170, No 0.290, No

Demographic and Economic Factors

As previously mentioned, Kendall correlations between the economy and the number of immigration laws passed are statistically insignificant. This is not true for Kendall correlations between industries in the economy and many other factors. Correlations between industries in the economy, every race, and Hispanic origin are all high and statistically significant at \(\alpha=0.05\). Income for every industry increases with increases these racial and ethnic subsections of the population.

r <- round(corr.test(Economy, na.approx(RAHO), use="complete.obs", method="kendall")$r, 5)
p <- corr.test(Economy, na.approx(RAHO), use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Economy with Race and Hispanic Origin") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Race and Hispanic Origin
Correlation, Significant
White Black Native Am Asian Pac Other Race Hispanic
Agriculture 0.834, Yes 0.830, Yes 0.595, Yes 0.830, Yes 0.660, Yes 0.830, Yes
Mining 0.770, Yes 0.766, Yes 0.632, Yes 0.766, Yes 0.669, Yes 0.766, Yes
Utilities 0.701, Yes 0.697, Yes 0.738, Yes 0.697, Yes 0.747, Yes 0.697, Yes
Construction 0.793, Yes 0.798, Yes 0.756, Yes 0.798, Yes 0.830, Yes 0.798, Yes
Manufacturing 0.903, Yes 0.899, Yes 0.664, Yes 0.899, Yes 0.729, Yes 0.899, Yes
Wholesale 0.963, Yes 0.968, Yes 0.724, Yes 0.968, Yes 0.798, Yes 0.968, Yes
Retail 0.963, Yes 0.968, Yes 0.724, Yes 0.968, Yes 0.798, Yes 0.968, Yes
Transportation 0.963, Yes 0.959, Yes 0.715, Yes 0.959, Yes 0.789, Yes 0.959, Yes
Information 0.986, Yes 0.991, Yes 0.747, Yes 0.991, Yes 0.821, Yes 0.991, Yes
FIRE 0.959, Yes 0.963, Yes 0.729, Yes 0.963, Yes 0.802, Yes 0.963, Yes
Service 0.982, Yes 0.986, Yes 0.743, Yes 0.986, Yes 0.816, Yes 0.986, Yes
Education 0.995, Yes 1.000, Yes 0.756, Yes 1.000, Yes 0.830, Yes 1.000, Yes
Entertainment 0.982, Yes 0.986, Yes 0.743, Yes 0.986, Yes 0.816, Yes 0.986, Yes
Other Industries 0.977, Yes 0.982, Yes 0.738, Yes 0.982, Yes 0.811, Yes 0.982, Yes
Govt Federal 0.931, Yes 0.936, Yes 0.747, Yes 0.936, Yes 0.802, Yes 0.936, Yes
Govt StateLoc 0.995, Yes 1.000, Yes 0.756, Yes 1.000, Yes 0.830, Yes 1.000, Yes

The same is true for the Native and Foreign-born population and every industry. Correlations are high and statistically significant at \(\alpha=0.05\). Income for every industry increases with increases in native and foreign-born subsections of the population.

r <- round(corr.test(Economy, na.approx(NAFB), use="complete.obs", method="kendall")$r, 5)
p <- corr.test(Economy, na.approx(NAFB), use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Economy with Native and Foreign-born") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Native and Foreign-born
Correlation, Significant
Native Born Foreign Born
Agriculture 0.83, Yes 0.83, Yes
Mining 0.77, Yes 0.77, Yes
Utilities 0.70, Yes 0.70, Yes
Construction 0.80, Yes 0.80, Yes
Manufacturing 0.90, Yes 0.90, Yes
Wholesale 0.97, Yes 0.97, Yes
Retail 0.97, Yes 0.97, Yes
Transportation 0.96, Yes 0.96, Yes
Information 0.99, Yes 0.99, Yes
FIRE 0.96, Yes 0.96, Yes
Service 0.99, Yes 0.99, Yes
Education 1.00, Yes 1.00, Yes
Entertainment 0.99, Yes 0.99, Yes
Other Industries 0.98, Yes 0.98, Yes
Govt Federal 0.94, Yes 0.94, Yes
Govt StateLoc 1.00, Yes 1.00, Yes

Country of origin does not show the same homogeneity in statistical significance with every sector the economy like other demographic factors however. At an \(\alpha=0.05\) Asia, Africa, Oceania, and Latin America each have high and statistically significant correlations with every industry, but Europe and North America both have statistically insignificant correlations with every industry. Income for every industry increases with increases in the foreign-born population from Asia, Africa, Oceania, and Latin America.

r <- round(corr.test(Economy, na.approx(FBCO), use="complete.obs", method="kendall")$r, 5)
p <- corr.test(Economy, na.approx(FBCO), use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Economy with Foreign-born Country of Origin") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Foreign-born Country of Origin
Correlation, Significant
Europe Asia Africa Oceania Latin Am North Am
Agriculture 0.24, No 0.83, Yes 0.83, Yes 0.83, Yes 0.83, Yes 0.30, No
Mining 0.30, No 0.77, Yes 0.77, Yes 0.76, Yes 0.77, Yes 0.31, No
Utilities 0.46, No 0.70, Yes 0.70, Yes 0.69, Yes 0.70, Yes 0.36, No
Construction 0.43, No 0.80, Yes 0.80, Yes 0.79, Yes 0.80, Yes 0.41, No
Manufacturing 0.34, No 0.90, Yes 0.90, Yes 0.89, Yes 0.90, Yes 0.41, No
Wholesale 0.34, No 0.97, Yes 0.97, Yes 0.96, Yes 0.97, Yes 0.38, No
Retail 0.34, No 0.97, Yes 0.97, Yes 0.96, Yes 0.97, Yes 0.38, No
Transportation 0.34, No 0.96, Yes 0.96, Yes 0.95, Yes 0.96, Yes 0.39, No
Information 0.31, No 0.99, Yes 0.99, Yes 0.99, Yes 0.99, Yes 0.36, No
FIRE 0.33, No 0.96, Yes 0.96, Yes 0.96, Yes 0.96, Yes 0.39, No
Service 0.32, No 0.99, Yes 0.99, Yes 0.98, Yes 0.99, Yes 0.37, No
Education 0.31, No 1.00, Yes 1.00, Yes 1.00, Yes 1.00, Yes 0.35, No
Entertainment 0.32, No 0.99, Yes 0.99, Yes 0.98, Yes 0.99, Yes 0.37, No
Other Industries 0.31, No 0.98, Yes 0.98, Yes 0.98, Yes 0.98, Yes 0.36, No
Govt Federal 0.36, No 0.94, Yes 0.94, Yes 0.94, Yes 0.94, Yes 0.34, No
Govt StateLoc 0.31, No 1.00, Yes 1.00, Yes 1.00, Yes 1.00, Yes 0.35, No

There is also a dichotomy between Immigrant and Nonimmigrant Visas and their relationship with the economy. At an \(\alpha=0.05\), employment-related Nonimmigrant Visas have statistically significant correlations with every industry, but employment related Immigrant Visas only have statistically significant correlations with Utilities. Non-employment related Nonimmigrant and Immigrant Visas do not have statistically significant correlations with any industry. Income for every industry increases with increases in employment-related Nonimmigrant Visas. Income for Utilities increases with increases in all employment-related Immigrant Visas. This relationship between employment-related Immigrant Visas and Utilities could also be a statistical anomaly. Within the context of statistical tests, at least one in twenty false positives are expected are a 95% confidence level.

r <- corr.test(Economy, Visas, use="complete.obs", method="kendall")$r
p <- corr.test(Economy, Visas, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Economy with Immigration Visas") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Immigration Visas
Correlation, Significant
IV Emp IV NonEmp NIV Emp NIV NonEmp
Agriculture 0.43, No 0.20, No 0.81, Yes 0.20, No
Mining 0.48, No 0.31, No 0.76, Yes 0.29, No
Utilities 0.63, Yes 0.30, No 0.72, Yes 0.24, No
Construction 0.48, No 0.30, No 0.78, Yes 0.25, No
Manufacturing 0.45, No 0.23, No 0.89, Yes 0.21, No
Wholesale 0.43, No 0.28, No 0.95, Yes 0.26, No
Retail 0.42, No 0.28, No 0.95, Yes 0.26, No
Transportation 0.43, No 0.27, No 0.94, Yes 0.26, No
Information 0.42, No 0.28, No 0.96, Yes 0.27, No
FIRE 0.44, No 0.28, No 0.94, Yes 0.27, No
Service 0.43, No 0.29, No 0.96, Yes 0.27, No
Education 0.41, No 0.29, No 0.95, Yes 0.28, No
Entertainment 0.43, No 0.29, No 0.96, Yes 0.27, No
Other Industries 0.42, No 0.28, No 0.95, Yes 0.27, No
Govt Federal 0.39, No 0.30, No 0.91, Yes 0.29, No
Govt StateLoc 0.41, No 0.29, No 0.95, Yes 0.28, No

Enforcement actions also have mixed results. Apprehensions and Returns have statistically insignificant correlations with every industry in the economy, but Removals have strong statistically significant positive correlations with every industry in the economy at \(\alpha=0.05\). Income for every industry increases with increases in Removals. As stated earlier, Removals are compulsory actions based on a court order and Returns represent immigrants stopped (without a court order) from entering at the borders.

r <- corr.test(Economy, Enforce, use="complete.obs", method="kendall")$r
p <- corr.test(Economy, Enforce, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "), 
  nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Economy with Immigration Enforcement") %>%
  add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Immigration Enforcement
Correlation, Significant
Removed Apprehended Returned
Agriculture 0.81, Yes -0.39, No -0.41, No
Mining 0.80, Yes -0.35, No -0.39, No
Utilities 0.67, Yes -0.19, No -0.22, No
Construction 0.71, Yes -0.27, No -0.31, No
Manufacturing 0.87, Yes -0.33, No -0.37, No
Wholesale 0.85, Yes -0.38, No -0.43, No
Retail 0.85, Yes -0.39, No -0.43, No
Transportation 0.86, Yes -0.38, No -0.42, No
Information 0.88, Yes -0.40, No -0.45, No
FIRE 0.85, Yes -0.38, No -0.43, No
Service 0.87, Yes -0.40, No -0.45, No
Education 0.89, Yes -0.41, No -0.46, No
Entertainment 0.87, Yes -0.40, No -0.45, No
Other Industries 0.87, Yes -0.40, No -0.45, No
Govt Federal 0.90, Yes -0.38, No -0.41, No
Govt StateLoc 0.89, Yes -0.41, No -0.46, No

The largest statistically significant correlations are between demographics factors and the number of immigration laws passed one year later. The difference between all the lagged correlations is not very large however, it averages around \(\tau \pm 0.02\).

Causality Analysis

\[Y_{ t }=\left( a_{ 0 }+a_{ 1 }Y_{ t-1 }+\cdots +a_{ p }Y_{ t-p } \right) +\left( b_{ 1 }X_{ t-1 }+\cdots +b_{ p }X_{ t-p } \right) +u_{ t } \tag{Y-Consequence}\]\[X_{ t }=\left( c_{ 0 }+c_{ 1 }X_{ t-1 }+\cdots +c_{ p }X_{ t-p } \right) +\left( d_{ 1 }Y_{ t-1 }+\cdots +d_{ p }Y_{ t-p } \right) +v_{ t } \tag{Y-Antecedent}\]

Granger Causality tests are useful for evaluating time series pairs for causality such that “\(X\) is said to Granger-cause \(Y\) if \(Y\) can be better predicted using the histories of both \(X\) and \(Y\) than it can by using the history of \(Y\) alone.”49 Essentially, the test checks whether one variable is useful in predicting the other taking lagging of both variables into consideration. In terms of the above two equations, if in the \(Y\)-Consequence equation \(H_{ A }: \exists b_{ i } \neq 0\) holds true, then \(X\) is a Granger-cause of \(Y\). If in the \(Y\)-Antecedent equation \(H_{ A }: \exists d_{ i } \neq 0\) holds true, then \(Y\) is a Granger-cause of \(X\). The test has limitations in multivariate cases and does not rule out confounding variables, but it remains one of the most useful tools for analyzing causality in model building. If one variable could be used to predict future change in the other, then it is reasonable to build a regression model that uses the predictor variable to explain future changes in the other variable. To conduct the test effectively, the maximum order of integration \(d\) for the time series group must first be determined.

Correlated <- data.frame(Laws, RAHO, NAFB, FBCO)
Arima_Orders <- data.frame(matrix(NA, nrow = ncol(Correlated), ncol = 5))
for (i in 1:ncol(Correlated)){
  ts_var <- na.approx(Correlated[, i])
  adf <- tseries::adf.test(ts_var)$p.value < 0.05
  Arima_Orders[i, 1] <- ifelse(adf, "Yes", "No")
  l <- BoxCox.lambda(ts_var)
  Arima_Orders[i, 2] <- format(round(l, 3), nsmall = 2)
  aa <- auto.arima(ts_var, stepwise=F, approximation=F, d=ifelse(adf, 0, NA), lambda=l)
  Arima_Orders[i, 3] <- aa$arma[1]
  Arima_Orders[i, 4] <- aa$arma[6]
  Arima_Orders[i, 5] <- aa$arma[2]
}
colnames(Arima_Orders) <- c("Stationary", "Box-Cox Transform", "p", "d", "q")
rownames(Arima_Orders) <- gsub('_', ' ', colnames(Correlated))
kable(Arima_Orders, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Time Series ARIMA Modeling") %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Time Series ARIMA Modeling
Stationary Box-Cox Transform p d q
Laws Yes 0.274 5 0 0
White No 0.374 0 2 1
Black No 0.22 1 1 0
Native Am Yes -0.366 5 0 0
Asian Pac No 0.255 1 2 2
Other Race No 0.255 1 2 2
Hispanic Yes 0.081 0 0 0
Native Born No 0.302 1 2 1
Foreign Born No -0.004 0 2 0
Europe No 0.217 2 2 3
Asia No 0.12 1 1 1
Africa No 0.44 1 2 2
Oceania No -0.019 4 2 1
Latin Am No 0.113 1 1 1
North Am No 0.54 2 2 2

The auto.arima() function in R chooses ARIMA models automatically using a variation of the Hyndman and Khandakar algorithm which combines unit root tests, minimization of the AICc, and MLEs to obtain an ARIMA model. The results suggest that the maximum order of integration is \(d = 2\). Using this maximum order of integration, tests are then run to check for the existence of Granger Causality.

Causality <- data.frame(matrix(NA, nrow = ncol(Correlated[,-1]), ncol = 2))
for (i in 2:ncol(Correlated)){
  for (j in 1:2) {
    cause <- as.data.frame(na.approx(Correlated[,ifelse(j==1, 1, i)]))
    effect <- as.data.frame(na.approx(Correlated[,ifelse(j==1, i, 1)]))
    d <- max(Arima_Orders[,"d"])
    gtest <- grangertest(cause, effect, order = d)
    f <- format(round(gtest$F[2], 2), nsmall = 2) 
    p <- gtest$`Pr(>F)`[2]<0.05
    Causality[i-1,j] <- paste(f, ifelse(p, "Yes", "No"), sep=", ")
  }
}
colnames(Causality) <- c("Number of Laws (Antecedent)", 
                         "Number of Laws (Consequence)")
rownames(Causality) <- gsub('_', ' ', colnames(Correlated[,-1]))
kable(Causality, longtable = T, booktabs = T, row.names=T, linesep = "",
  align="c", caption = "Legislation-Demographic Causality Test") %>%
  add_header_above(c(" ", "Granger Causality Statistic, Significant" = ncol(Causality))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Legislation-Demographic Causality Test
Granger Causality Statistic, Significant
Number of Laws (Antecedent) Number of Laws (Consequence)
White 1.61, No 12.58, Yes
Black 0.83, No 13.51, Yes
Native Am 2.68, No 8.12, Yes
Asian Pac 2.85, No 5.86, Yes
Other Race 1.28, No 8.47, Yes
Hispanic 4.17, Yes 12.46, Yes
Native Born 3.94, Yes 13.36, Yes
Foreign Born 0.86, No 9.21, Yes
Europe 1.10, No 1.22, No
Asia 0.33, No 6.47, Yes
Africa 5.02, Yes 3.58, Yes
Oceania 1.78, No 9.98, Yes
Latin Am 5.81, Yes 11.49, Yes
North Am 2.97, No 2.27, No

The test results indicate that changes in most demographics exhibit the characteristics necessary for being a Granger cause of the number of immigration laws are passed. The exceptions are foreign-born of European and North American origin. Granger Causality does not imply true causality, but the results do indicate that changes in the demographic variables displaying Granger Causality are useful predictors for forecasting future changes in number of immigration laws enacted. The bidirectional causality relationship with some variables (Hispanic, Native Born, Africa, and Latin America) suggests a feedback loop which implies that changes in the number of immigration laws enacted is also a useful predictor for forecasting future changes in those four demographic elements.

Model Selection

Based on the strength of the one-year correlations and bidirectional Granger causality for some of the correlated factors, a few models will be constructed and tested. First, a model that attempts to explain the number of laws that will be passed one year in the future based on changes to multiple demographic factors will be constructed. Then, four smaller models that attempt to explain changes to the Hispanic, Native born, foreign-born from Africa, and foreign-born from Latin America population segments based on the number of laws passed one year prior will be constructed. Given the oddities in the distributions of some of these time series’, multiple regression model types are tuned and examined to see which model type is most suited for these data.

  • Linear Regression Models
    • Robust Linear Regression (RLM)
    • Principal Component Regression (PCR)
    • Partial Least Squares (PLS)
    • Elastic Net Regression (ENET)
  • Nonlinear Regression Models
    • Artificial Neural Networks (ANN)
    • Multivariate Adaptive Regression Splines (MARS)
    • Support Vector Machines (SVM)
    • \(K\)-Nearest Neighbors (KNN)
  • Tree-Based Regression Models
    • Classification and Regression Tree (CART)
    • Random Forest (RF)
    • Stochastic Gradient Boosting (SGB)
    • Rule-Based Cubist (CUBE)
model_vars <- na.approx(Correlated[,c(1,3:14)])
set.seed(698)
rows_train <- createDataPartition(model_vars[,1], p=0.75, list=F)
X_train <- model_vars[rows_train, -1]
X_test <- model_vars[-rows_train, -1]
Y_train <- model_vars[rows_train, 1]
Y_test <- model_vars[-rows_train, 1]
set.seed(698)
ctrl <- trainControl(method = "cv", number = 10)

Linear Regression Models

set.seed(698)
tune01 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "rlm", trControl = ctrl)
plot(tune01, main=tune01$modelInfo$label)

set.seed(698)
tune02 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "pcr", trControl = ctrl, tuneLength = 25)
plot(tune02, main=tune02$modelInfo$label)

set.seed(698)
tune03 <- train(x = lag(X_train, k=1), y = Y_train, 
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "pls", trControl = ctrl, tuneLength = 25)
plot(tune03, main=tune03$modelInfo$label)

set.seed(698)
tg <- expand.grid(lambda = c(0, 0.05, .1), fraction = seq(0.05, 1, length = 25))
tune04 <- train(x = lag(X_train, k=1), y = Y_train, 
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "enet", trControl = ctrl, tuneGrid = tg)
plot(tune04, main=tune04$modelInfo$label)

Nonlinear Regression Models

set.seed(698)
tg <- expand.grid(.decay = c(0, 0.01, .1), .size = c(1:10), .bag = F)
tune05 <- train(x = lag(X_train, k=1), y = Y_train,
  method = "avNNet", tuneGrid = tg, trControl = ctrl, linout = T, 
  preProcess = c("BoxCox","center","scale","knnImpute"),
  trace = F, MaxNWts = 10 * (ncol(X_train) + 1) + 10 + 1, maxit = 500)
plot(tune05, main=tune05$modelInfo$label)

set.seed(698)
tg <- expand.grid(degree = c(1:2), nprune = c(2:10))
tune06 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),              
  method = "earth", tuneGrid = tg, trControl = ctrl)
plot(tune06, main=tune06$modelInfo$label)

set.seed(698)
tg <- expand.grid(C=c(0.01,0.05,0.1), degree=c(1,2), scale=c(0.25,0.5,1))
tune07 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "svmPoly",  tuneGrid = tg,  trControl = ctrl)
plot(tune07, main=tune07$modelInfo$label)

set.seed(698)
tg <- data.frame(.k = 1:20)
tune08 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "knn", tuneGrid = tg, trControl = trainControl(method = "cv"))
plot(tune08, main=tune08$modelInfo$label)

Tree-based Regression Models

set.seed(698)
tg <- expand.grid(maxdepth= seq(1,10,by=1))
tune09 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "rpart2", tuneGrid = tg, trControl = ctrl)
plot(tune09, main=tune09$modelInfo$label)

set.seed(698)
P <- ncol(X_train) 
tg <- expand.grid(mtry=seq(2, P, by = floor(P/5)))
tune10 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "rf", tuneGrid = tg, trControl = ctrl)
plot(tune10, main=tune10$modelInfo$label)

set.seed(698)
tg <- expand.grid(interaction.depth=seq(1,6,by=1), n.trees=c(25,50,100,200),
  shrinkage=c(0.01,0.05,0.1,0.2), n.minobsinnode=10)
tune11 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "gbm", tuneGrid = tg, trControl = ctrl, verbose=F)
plot(tune11, main=tune11$modelInfo$label)

set.seed(698)
tg <- expand.grid(committees = c(1,5,10,20,50,100), neighbors = c(0,1,3,5,7))
tune12 <- train(x = lag(X_train, k=1), y = Y_train,
  preProcess = c("BoxCox","center","scale","knnImpute"),
  method = "cubist", tuneGrid = tg, trControl = ctrl)
plot(tune12, main=tune12$modelInfo$label)

Model Comparison

All the models perform almost equivalently on these data. As a matter of fact, the optimal RMSE, \(R^2\), and MAE training set resampling performance metrics are associated with different models depending on the random seed. The need to validate training set resampling results–which are highly optimistic as a result of the repeated sampling– with a test set is highlighted by cases such as this where the resampling performance metrics are nearly identical. Based on the RMSE of the training set resampling the respectively optimal linear, nonlinear, and tree-based models based are the Partial Least Squares (PLS), \(K\)-Nearest Neighbors (KNN), and Stochastic Gradient Boosting (SGB) models. These training set resampling rankings hold for the PLS and KNN models after running the test set validation. Since the model performance is essentially equivalent and the goal of this modeling process is explanation of relationships between variables rather than prediction, the non-covariance based KNN model which cannot be cleanly summarized will not be used.50 The focus will be on covariance based linear models.

fits <- list(RLM=tune01, PCR=tune02, PLS=tune03, ENET=tune04, ANN=tune05, MARS=tune06,
  SVM=tune07, KNN=tune08, CART=tune09, RF=tune10, SGB=tune11, CUBE=tune12)
bwplot(resamples(fits))

metrics <- function(tune) {
  RMSE = min(tune$results$RMSE)
  Rsquared = max(tune$results$Rsquared)
  MAE = min(tune$results$MAE)
  return(cbind(RMSE, Rsquared, MAE)) }
resampling <- data.frame(rbind(metrics(tune01), metrics(tune02), 
  metrics(tune03), metrics(tune04), metrics(tune05), metrics(tune06), 
  metrics(tune07), metrics(tune08), metrics(tune09), metrics(tune10), 
  metrics(tune11), metrics(tune12)), row.names = c("RLM","PCR","PLS",
  "ENET", "ANN", "MARS", "SVM", "KNN", "CART", "RF", "SGB", "CUBE"))
validation <- data.frame(row.names = c("RLM","PCR","PLS", "ENET", 
  "ANN", "MARS",  "SVM", "KNN", "CART", "RF", "SGB", "CUBE"), rbind(
  postResample(pred = predict(tune01, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune02, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune03, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune04, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune05, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune06, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune07, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune08, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune09, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune10, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune11, newdata = X_test), obs = Y_test),
  postResample(pred = predict(tune12, newdata = X_test), obs = Y_test)))
kable(cbind(resampling, validation), longtable = T, booktabs = T, row.names=T, linesep = "",
  caption = "Performance Metrics") %>%
  add_header_above(c("","Training Set Resampling" = 3, "Test Set Validation" = 3)) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Performance Metrics
Training Set Resampling
Test Set Validation
RMSE Rsquared MAE RMSE Rsquared MAE
RLM 1.101253 0.4215269 0.7128411 1.717429 0.3141630 0.9216199
PCR 1.087937 0.4378742 0.7331757 1.673304 0.3429144 0.9411131
PLS 1.086624 0.4257641 0.7380014 1.672647 0.3426580 0.9397111
ENET 1.090662 0.4184976 0.7339712 1.697809 0.3291550 0.9539869
ANN 1.134243 0.3915584 0.7437596 1.743432 0.2716640 0.9865738
MARS 1.116175 0.3960809 0.7643166 1.683975 0.3233568 0.9352765
SVM 1.104148 0.4127897 0.7148267 1.743417 0.2898321 0.8877538
KNN 1.057420 0.4609521 0.7081444 1.633050 0.3684154 0.9598214
CART 1.127081 0.3890166 0.7731064 1.826424 0.1877542 1.0078656
RF 1.242846 0.3149483 0.8172722 1.489037 0.4894702 1.0142196
SGB 1.071784 0.4307816 0.7269328 1.699391 0.3222708 0.9427416
CUBE 1.116010 0.3955091 0.7271977 1.603782 0.3949694 0.9653687

Coefficient Analysis

LagFwd <- data.frame(stats::lag(na.approx(data698[,1]), k=-1))
LagBwd <- data.frame(stats::lag(na.approx(data698[,1]), k=1))
X <- data.frame(na.approx(data698[,c(3:11,13:16)]))
X[, 2] <- X[, 2] + X[, 3]
colnames(X)[2] <- gsub("_.*", "", colnames(X)[2])
X <- X[, -3]
prepro <- preProcess(X, method=c("BoxCox", "center", "scale", "knnImpute"))
demographics <- predict(prepro, X)
LawsEffect <- data.frame(LagFwd, demographics)
colnames(LawsEffect) <- c("LawsFuture", colnames(demographics))
LawsCause <- data.frame(LagBwd, demographics)
colnames(LawsCause) <- c("LawsPast", colnames(demographics))

Partial Least Squares (PLS)

The first model attempts to explain the number of laws that will be passed one year in the future based on changes to multiple demographic factors. There exists a very high level of multicollinearity in these demographic factors however. In fact, attempting to remove highly correlated predictors removes nine of the twelve variables leaving only foreign-born from Africa, Native-Americans, and Whites. These were the three variables with very unusual distributions due to slavery, genocide, and admissions being based on race until around 1950. Using a PLS model–which is essentially supervised Principal Component Analysis– is a regression technique that works well when a high level of multicollinearity is present. Applying the PLS model with two components (determined by the tuning process) to Box-Cox transformations of these data and examining the residuals for normality with a Shapiro-Wilks test suggests that the PLS model provides an appropriate fit for these data at \(\alpha=0.05\). Examining legislation as a consequence where the cause is demographic factor changes and the effect is the number of laws enacted one year later, the PLS model suggests that the variation in the number of immigration laws enacted is explained in varying amounts by earlier changes in each of the demographic factors. It is important to note that the PLS coefficients reflect the amount of change explained by the demographic factor and not the direction of the change. The direction of change was however, noted in the Correlation Analysis where all were found to have a positive relationship. Therefore, it can be surmised that when there are increases in these demographic elements, the number of immigration laws enacted one year later increases. All of these jackknife variance estimates are statistically significant at \(\alpha=0.05\).

fit1 <- plsr(LawsFuture ~ ., data = LawsEffect, ncomp = 2, validation = "CV", jackknife = T)
shapiro.test(fit1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit1$residuals
## W = 0.76207, p-value < 2.2e-16
jack <- jack.test(fit1, ncomp = fit1$ncomp, use.mean = T)
model <- data.frame(jack[c(1:3,5)])
model <- model[order(model[,4]>0.05, -abs(model[,1])),]
colnames(model) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|)")
rownames(model) <- gsub('_', ' ', rownames(model))
kable(model, longtable = T, booktabs = T, row.names=T, linesep = "",
  caption = "Number of Laws (Consequence)") %>%
  add_header_above(c("","Ordered Partial Least Squares (PLS) Coefficients" = ncol(model))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Number of Laws (Consequence)
Ordered Partial Least Squares (PLS) Coefficients
Estimate Std. Error t value Pr(>|t|)
Other Race 0.1794285 0.0598980 2.995566 0.0150643
Native Am 0.1733696 0.0530547 3.267750 0.0097176
Africa 0.1642788 0.0325645 5.044722 0.0006953
Asian Pac 0.1376798 0.0442626 3.110526 0.0125082
Black 0.0836869 0.0104169 8.033746 0.0000214
Native Born 0.0757040 0.0084934 8.913244 0.0000092
White 0.0737364 0.0086038 8.570215 0.0000127
Foreign Born 0.0626821 0.0106390 5.891742 0.0002314
Hispanic 0.0555109 0.0092094 6.027655 0.0001958
Oceania 0.0480144 0.0083247 5.767740 0.0002702
Latin Am 0.0434573 0.0071833 6.049785 0.0001905
Asia 0.0404559 0.0070373 5.748799 0.0002767

Robust Linear Model (RLM)

The remaining four models attempt to explain changes to four demographic factors based on the number of laws that were enacted one year prior. Addressing multicollinearity is unnecessary for these small models with one \(X\) variable. There are irregularities in some of the distributions however which suggest the need for a robust method, hence a Robust Linear Model (RLM). Applying the RLM model with bisquare weights and an intercept (determined by the tuning process) to Box-Cox transformations of these data and examining the residuals for normality with a Shapiro-Wilks test suggests that the RLM model provides an appropriate fit for most of these data at \(\alpha=0.05\). The exception is the model for foreign-born from Africa which has a Shapiro-Wilks \(p\)-value of \(0.06\) and an unusual distribution due to the period of slavery. Post-slavery the distribution of the transformed Africa variable is more linear and therefore more suited for an RLM model. Examining legislation as an antecedent where the effect is demographic factor changes and the cause is the number of laws enacted one year prior, the RLM suggests that as the number of immigration laws enacted increases, specific demographic elements change such as the number of Native Born \(\uparrow\), Hispanics \(\uparrow\), and foreign-born from specific countries of origin (Africa \(\uparrow\), Latin America \(\uparrow\)). These relationships are statistically significant at \(\alpha=0.05\).

ctrl <- lmRob.control(weight=c("Bisquare","Bisquare"))
fit2 <- lmRob(Hispanic ~ LawsPast, data = LawsCause, control=ctrl)
fit3 <- lmRob(Native_Born ~ LawsPast, data = LawsCause, control=ctrl)
fit4 <- lmRob(Africa ~ LawsPast, data = LawsCause, control=ctrl)
fit5 <- lmRob(Latin_Am ~ LawsPast, data = LawsCause, control=ctrl)
par(mfrow=c(2, 2), mar = c(0, 0.5, 2, 0), oma = c(0.5, 0.5, 2, 0.5))
plot(LawsCause$Hispanic, type="l", xaxt = "n", yaxt = "n", main="Hispanic", col="blue")
plot(LawsCause$Native_Born, type="l", xaxt = "n", yaxt = "n", main="Native Born", col="blue")
plot(LawsCause$Africa, type="l", xaxt = "n", yaxt = "n", main="Africa", col="blue")
plot(LawsCause$Latin_Am, type="l", xaxt = "n", yaxt = "n", main="Latin America", col="blue")
title("Box-Cox Transformations", outer=TRUE)

shapiro.test(fit2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit2$residuals
## W = 0.96259, p-value = 1.132e-05
shapiro.test(fit3$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit3$residuals
## W = 0.96916, p-value = 7.518e-05
shapiro.test(fit4$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit4$residuals
## W = 0.98855, p-value = 0.06734
shapiro.test(fit5$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  fit5$residuals
## W = 0.92388, p-value = 2.025e-09
models <- rbind(Hispanic=summary(fit2)$coefficients[-1,], 
  Native_Born=summary(fit3)$coefficients[-1,], 
  Africa=summary(fit4)$coefficients[-1,], 
  Latin_Am=summary(fit5)$coefficients[-1,])
models <- models[order(models[,1], decreasing = T), ]
rownames(models) <- gsub('_', ' ', rownames(models))
kable(models, longtable = T, booktabs = T, row.names=T, linesep = "",
  caption = "Ordered Number of Laws (Antecedent)") %>%
  add_header_above(c("","Ordered Robust Linear Model (RLM) Coefficients" = ncol(models))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Ordered Number of Laws (Antecedent)
Ordered Robust Linear Model (RLM) Coefficients
Estimate Std. Error t value Pr(>|t|)
Native Born 0.4700338 0.0498572 9.427596 0
Hispanic 0.4345160 0.0459628 9.453639 0
Latin Am 0.3201317 0.0369501 8.663889 0
Africa 0.3142223 0.0481317 6.528390 0

Discussion

Correlations

Kendall correlations between the number of future and past immigration laws enacted and the demographic factors examined are statistically significant at \(\alpha=0.05\) while the correlations between the number of future and past immigration laws passed and all economic factors examined are not. The Correlation Analysis between Legislation and Demographic Factors shows statistically significant positive correlations between the number of future and past immigration laws passed and demographic factors. With a just few exceptions, nearly all of these correlations are around \(\tau=0.5\). The relationship between the number of future and past immigration laws and Native-Americans hovers around \(\tau=0.25\). This lower correlation makes sense intuitively since Native-Americans are not generally associated with legislation on immigration. The relationships between the number of future and past immigration laws and foreign-born from Europe and North America are all between \(\tau=0.18\) and \(\tau=0.27\). This is likely related to European and North American arrivals having an unnatural distribution that was inflated until race stopped being a factor in admissions around 1950.

Although not necessary for this analysis, in keeping with the meticulousness of the correlational analysis, every combinatoric possibility for correlations was examined. In doing this, it was found that the correlations between demographic factors and various industries in the economy contradict several stereotypes. The Correlation Analysis show that all races, Hispanics, Native-born, and Foreign-born each have a strong and statistically significant positive correlations of \(\tau=0.6\) or higher with every industry in the economy. This finding was refreshing, but it was also interesting to see that out of the individual countries of origin for Foreign-born, Europe and North America did not have statistically significant correlations with any industry. This, like the low correlations those regions had with legislation, is likely due to the unnatural distribution that European and North American arrivals have due to admissions being based on race until around 1950. It was surprising to see in the Correlation Analysis that employment-related Nonimmigrant (temporary) Visas had strong statistically significant correlations of \(\tau=0.72\) or higher with every sector of the economy but nearly all employment-related Immigrant (permanent) Visas had statistically insignificant correlations. The exception to this finding of statistically insignificant correlation for Immigrant Visas is the \(\tau=0.63\) relationship between Immigrant Visas and Utilities. This may however, just be a statically anomaly since one in twenty false positives are likely at a 95% confidence level. This finding, if true however, would not be counterintuitive as it is easy to see how permanent relocations can be associated with increases in Utilities. It is strange however that this demographic is has no statistically significant relationship with other sectors in the economy. In the final section of the Correlation Analysis, every industry in the economy is found to have a strong and statistically significant positive correlations of \(\tau=0.67\) or higher with Removals. Yet no other enforcement action was found to have statistically significant correlations with the economy. At first glance this is strange but it makes sense when looking at the definitions. Apprehensions and Returns are swift actions that involve a limited number of parties. Court ordered Removals are prolonged processes that involve several institutions such as government agencies, private prison systems, and law firms.

Causation

It cannot be stressed enough that Granger Causality tests, like every other statistical test, do not prove or disprove any hypothesis. The results simply provide statistical support for or against an argument made by a hypothesis. It can be further stated that Granger Causality does not hypothesize true causality, but rather the existence predictive ability such that if \(X\) can help predict \(Y\), then \(X\) s a Grange cause of \(Y\). These tests use lagged time series observations to see if prior observations of \(X\) are useful in predicting future observations of \(Y\), and then test the same for the converse relationship. Again, this test has limitations in multivariate cases and does not rule out confounding variables, but it remains one of the most useful tools for analyzing causality in model building. The first step to conducting this lagged time series test appropriately involves determining the ARIMA order of differencing for each time series variable and then using the maximum order in the Granger Causality test. In this case the maximum order of differencing was found to be \(d=2\).

The Granger Causality test returns some unexpected results. The bidirectional relationship between the number of future and past immigration laws enacted and demographic factors does exist, but in a very disparate way. Legislation on immigration is a consequence of many demographic changes but an antecedent for change to only a few of those demographic factors. Specifically, changes in every demographic factor except foreign-born from Europe and North America are found to be a Granger cause for change in number of immigration laws enacted in the future. Yet the laws that are enacted are found to be Granger causes for future change in only four demographic factors: Hispanic, Native-born, and foreign born from Africa and Latin America. Hence, legislation that is enacted in response to various demographic changes impacts those demographic elements disproportionately. It could be argued that this bidirectional relationship does not exist and that these lagged results are independent of each other, but it is highly unlikely that laws are enacted by the U.S. Congress without regard to both past events and future outcomes.

Regression

The findings from both the Correlation and Causality analyses support the existence of a relationship between demographics and legislation lagged backward and forward. Capturing this bidirectional relationship calls for the creation of multiple models to capture the effects of the feedback loop. The first model looks at the number of laws on immigration enacted as a consequence of demographic changes. This is a multivariate model where the independent variables are the demographic factors and the dependent variable is the number of immigration laws enacted one year later. The next set of models look at the number of laws on immigration enacted as an antecedent for change to individual demographic factors. These are small univariate models where the dependent variable is a given demographic factor and the independent variable is the number of immigration laws enacted one year prior. The data have some peculiarities which complicate the modeling process however.

Although social phenomena tend to follow an exponential distribution, some of these data are not exponentially distributed. The impact of African slavery, Native American genocide, and race being used as a factor for admission into the country until 1950 distort many of the distributions. There was therefore a need to tune and test several candidate models on transformations these data to see which provided the most appropriate fit for the data. The model selection process evaluated four linear models, four nonlinear models, and four tree-based models. Interestingly enough, all of the models performed almost equivalently on these data. The average performance metrics of the training set resampling were \(\textrm{RMSE}=1.11\), \(R^2=0.41\), and \(\textrm{MAE}=0.74\). These metrics are highly optimistic because they are based on repeated sampling, but even after validating the results with the test set there was very little variation in model performance. The average performance metrics of the test set validation were \(\textrm{RMSE}=1.68\), \(R^2=0.33\), and \(\textrm{MAE}=0.95\). This homogeneity in performance is beneficial in that it negates the need to balance the tradeoff between model complexity and interpretability. Therefore, given the benefit of model indifference and the need to analyze coefficients rather than predict outcomes, linear models were chosen.

Another issue with these data was the high level of multicollinearity. As mentioned in the modeling process, if all the highly correlated variables were removed from the model, only three of the twelve variables would be left. Those variables are foreign-born from Africa, Native-Americans, and Whites which are variables with distorted distributions due to the factors mentioned at the beginning of the previous paragraph. To address the multicollinearity a Partial Least Square (PLS) regression model is used. PLS regression is an enhanced Principal Component Analysis (PCA) regression model that takes the response variable into account when constructing the model components. The variance in the number of immigration laws passed one in the future is explained in a statistically significant way by changes in every demographic factor that was found to be a Granger cause for the change. The significance of the variance estimates was derived using jackknife resampling. Although PLS coefficients reflect variance and not direction, the Correlational Analysis showed that these relationship are positive. Meaning that as each of these demographic factors increase, so do the number of laws the following year. The variables that explain most of the variance in the number of future laws enacted are the demographic segments of Other Race, Native American, Asian Pacific, and foreign-born from Africa. Other Race seems rather odd until the definition is examined. Other Race is a catchall that has meant different things at different times; from Hispanic when there was no such category, multiracial, and those who do not feel that the given categories describe them. It can be said that other race is the most diverse category of all. It also pulls back the curtain on the social construct of race and ethnicity, but that is not the topic of this analysis. The results for Native Americans and foreign-born from Africa can do not seem accurate at first glance, but they are in congruence with the historical accounts cited in the Literature Review. Early laws focused on excluding African-Americans and untaxed Native-Americans from the right of citizenship. It was not until after slavery ended that in 1870 that citizenship was extended to “aliens of African nativity and to persons of African descent” and not until 1924 that Native Americans “were granted full U.S. citizenship.”51 Changes in the number of Asian and Pacific Islanders impacting the number of immigration laws passed the next year is not surprising. Many immigration laws focused entirely on those demographic segments such as the Alien and Sedition Acts of 1798 and the Chinese Exclusion Act of 1882, with the former being exercised during World War II to facilitate the implementation of Japanese Internment Camps.

The small models had fewer modelling issues but were just as interesting. The model used for these data was a Robust Linear Model (RLM) which was chosen for its ability to handle outliers and irregularities effectively. The results of the model show that the number of laws enacted one year prior has a statistically significant impact on the four demographic segments that were found to have a correlational and causal relationship with lagged legislation. Those four demographic segments are Native born, Hispanics, foreign-born from Latin America, and foreign-born from Africa. The largest relationship was the one between the number of Native born in the population and immigration laws past one year prior. There is nothing extraordinary about this relationship. When the number of immigration laws enacted increases, the number of Native born in the population increases the following year. The rest of the findings are not so intuitive however. When the number of immigration laws enacted increases the number of Hispanics, foreign-born from Latin America, and foreign-born from Africa increases. There is something about this which does not seem quite right. Many laws like the one proposed by the current presidential administration focus on restricting those specific population segments in a way that would lead one to believe that a decrease would be more likely. Yet this is not the case empirically; and Professor Douglass Massey of Princeton who has studied this phenomena has found the explanation.52 Massey found that immigration is a circular flow where immigrants come to work then return home. When there are no restrictions, immigrants come and go as they please. When restrictions are placed on immigration, outflow decreases. Immigrants who would normally come and go freely now chose to stay because they risk not being able to return if they leave. Hence the increase in foreign-born from Africa and Latin America, and by extension Hispanics.

Conclusion

The current presidential administration is proposing targeted immigration reforms that it claims will improve the economy. Yet legislation on immigration does not have a statistically significant relationship with the economy which has improved despite the number and type of immigration laws enacted throughout history. Claiming the economy will improve if a change is made, knowing that improvement is likely regardless of the change, is one way politicians exercise political savvy in advancing their interests. Much can be said about the ethics behind such self-serving behavior,53 but only a reference to further reading on the topic will be provided in this analysis. The statistically significant relationship that does exists with legislation on immigration however, is the bidirectional temporal relationship with demographics in the United States. Changes in demographics impact the number of laws on immigration enacted one year later; and the number of laws enacted one year prior impact demographics. Although intent cannot be inferred, this research shows that it is more likely that opponents of the current presidential administration are correct in stating that the proposed reforms will not improve the economy, but will instead impact the overall diversity of the country. Opponents are not correct however, about the alleged negative impact on diversity that will result from those changes. The number of Native born will increase, but due to interruptions in natural immigration outflows, there will also be increases in foreign-born from Africa and Latin America, and by extension Hispanics. If the proposed immigration reforms are meant to either improve the economy or reduce diversity, they will accomplish neither.


  1. https://www.migrationpolicy.org/research/timeline-1790

  2. https://www.archives.gov/files/research/naturalization/420-major-immigration-laws.pdf

  3. https://www.whitehouse.gov/briefings-statements/white-house-framework-immigration-reform-border-security/

  4. https://web.archive.org/web/20180221163615/https://www.uscis.gov/aboutus

  5. https://www.uscis.gov/aboutus

  6. https://www.migrationpolicy.org/

  7. Kaufman, Bruce E., and Julie L. Hotchkiss. The Economics of Labor Markets. Thomson/South-Western, 2006.

  8. https://help.cbp.gov/app/answers/detail/a_id/72

  9. https://www.census.gov/newsroom/pdf/cspan_fb_slides.pdf

  10. https://www.archives.gov/publications/prologue/2002/summer/immigration-law-1.html

  11. https://www.washingtonpost.com/news/wonk/wp/2018/02/06/immigration-plan

  12. https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml

  13. https://www.census.gov/prod/www/decennial.html

  14. https://www.census.gov/population/www/documentation/twps0076/twps0076.html

  15. https://www.census.gov/prod/cen2000/dp1/2kh00.pdf

  16. https://www.census.gov/history/pdf/orangenj-92017.pdf

  17. https://www.census.gov/population/www/documentation/twps0081/twps0081.html

  18. https://www.census.gov/population/www/documentation/twps0029/tab01.html

  19. https://www.census.gov/population/www/documentation/twps0029/tab02.html

  20. http://unstats.un.org/unsd/demographic/products/dyb/dyb2.htm

  21. http://www.slavevoyages.org/assessment/estimates

  22. https://nativestudy.wordpress.com/

  23. http://mathworld.wolfram.com/Interpolation.html

  24. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/US_Census_Data.csv

  25. https://www.uscis.gov/tools/reports-studies/immigration-forms-data

  26. https://travel.state.gov/content/travel/en/legal/visa-law0/visa-statistics.html

  27. https://travel.state.gov/content/travel/en/legal/visa-law0/visa-statistics/nonimmigrant-visa-statistics.html

  28. http://www.ustraveldocs.com/in/in-niv-typeall.asp#

  29. https://www.foreignlaborcert.doleta.gov/perm.cfm

  30. https://www.foreignlaborcert.doleta.gov/performancedata.cfm

  31. http://www.flcdatacenter.com/

  32. https://travel.state.gov/content/travel/en/us-visas/visa-information-resources/all-visa-categories.html

  33. https://guides.law.fsu.edu/c.php?g=84898&p=546805

  34. https://ilw.com/resources/

  35. https://www.archives.gov/files/research/naturalization/420-major-immigration-laws.pdf

  36. https://www.congress.gov/search

  37. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/ImmigrationLaws_1790to2012.csv

  38. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/USCIS_2009to2017.csv

  39. https://www.dhs.gov/immigration-statistics/yearbook

  40. https://www.dhs.gov/immigration-statistics/yearbook/2016/table6

  41. https://www.dhs.gov/immigration-statistics/yearbook/2016/table25

  42. https://www.dhs.gov/immigration-statistics/yearbook/2016/table33

  43. https://www.dhs.gov/immigration-statistics/yearbook/2016/table39

  44. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/DHS_1820to2016.csv

  45. https://www.bea.gov/iTable/iTable.cfm?ReqID=51&step=1#reqid=51&step=2&isuri=1

  46. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/GDPbyIndustry_1987to2016.csv

  47. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/DATA698_Project.csv

  48. https://onlinecourses.science.psu.edu/stat509/node/158

  49. http://davegiles.blogspot.co.uk/2011/04/testing-for-granger-causality.html

  50. Kuhn, Max and Kjell Johnson, Applied Predictive Modeling, Springer Science+Business Media, 2013

  51. https://github.com/jzuniga123/SPS/blob/master/DATA%20698/Immigration_Legislation.pdf

  52. https://www.princeton.edu/news/2016/04/20/tighter-enforcement-along-us-mexico-border-backfired-researchers-find

  53. http://www.hbs.edu/faculty/Publication%20Files/self-serving+justifications