Correlation Analysis
Pearson (\(r\)) correlation measures the strength of relationships between linear variables. This analysis involves nonlinear variables therefore the metric used is Kendall correlation (\(\tau\)) which is a non-parametric measure of the strength of relationships between nonlinear variables. These correlations are then tested for statistical significance. That is, is the existence of the correlations significantly different from zero given approximately normal variation (and covariance) in the data.
\[r = \frac{\sum{(x-m_x)(y-m_y)}}{\sqrt{\sum{(x-m_x)^2}\sum{(y-m_y)^2}}} \tag{Pearson}\]\[\tau = \frac{n_c - n_d}{\frac{1}{2}n(n-1)} \tag{Kendall}\]
Bidirectional Impact of Policy Lag
There is a lag between the enactment of a law and its full implementation. Changes in laws necessitate the adjustment of institutional processes which are mired by the rules and procedures guiding the bureaucracies implementing the laws. It is worth noting that bureaucracies have been maligned with the negative connation behind red tape, but the bureaucratic adherence to rules is a feature–and not a bug– that hedges against chaos. The point is that there is no value judgment being made here. It is simply true that Immigration applications are processed on an ongoing basis while laws change and agencies readjust. Applications in progress must be managed around changing immigration laws. When looking at the relationship between legislation and other variables, it is therefore important to also look at the lagged relationships; and not just in one direction. When laws are reactive rather than proactive, there is also a lag between the motivation for a law and its enactment.
Legislation <- na.approx(window(cbind(
Past_Laws_2yr=stats::lag(Laws, k=2),
Past_Laws_1yr=stats::lag(Laws, k=1),
Current_Laws=Laws,
Future_Laws_1yr=stats::lag(Laws, k=-1),
Future_Laws_2yr=stats::lag(Laws, k=-2)),
start=1790, end=2016), rule = 2)
Legislation and Demographic Factors
First the Kendall correlation between the number of new, one- and two-year old, and one- and two-year out immigration laws and demographic factors is examined. When it comes to population based on Race and Hispanic Origin, legislation has positive correlations between \(\tau=0.49\) and \(\tau=0.56\) for Hispanic origin and nearly every race. The strongest of these correlations are with the number of laws passed one year in the future. The exception to this pattern is Native-American population which has positive correlations that hover around \(\tau=0.25\), the strongest of which is with laws passed two years in the future. All of these correlations are significant at \(\alpha=0.05\). Increases in racial and ethnic subsections of the population are associated with future increases in the number of immigration laws. The strongest correlations are with Other Races then Asian and Pacific Islanders.
r <- corr.test(na.approx(RAHO), Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(na.approx(RAHO), Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Laws with Race and Hispanic Origin") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Race and Hispanic Origin
|
Correlation, Significant
|
|
Past Laws 2yr
|
Past Laws 1yr
|
Current Laws
|
Future Laws 1yr
|
Future Laws 2yr
|
White
|
0.493, Yes
|
0.504, Yes
|
0.504, Yes
|
0.505, Yes
|
0.498, Yes
|
Black
|
0.493, Yes
|
0.504, Yes
|
0.504, Yes
|
0.505, Yes
|
0.498, Yes
|
Native Am
|
0.225, Yes
|
0.236, Yes
|
0.255, Yes
|
0.272, Yes
|
0.282, Yes
|
Asian Pac
|
0.517, Yes
|
0.527, Yes
|
0.530, Yes
|
0.534, Yes
|
0.530, Yes
|
Other Race
|
0.513, Yes
|
0.529, Yes
|
0.543, Yes
|
0.560, Yes
|
0.555, Yes
|
Hispanic
|
0.493, Yes
|
0.504, Yes
|
0.504, Yes
|
0.505, Yes
|
0.498, Yes
|
For the Native and Foreign-born population, the correlations with the number of immigration laws passed are also moderate and statistically significant at \(\alpha=0.05\). Natives have positive correlations that hover around \(\tau=0.5\). Foreign-born have positive correlations that hover around \(\tau=0.43\). The strongest of these correlations are with the number of laws passed one year in the future. Increases in the native and foreign-born subsections of the population are associated with future increases in the number of immigration laws.
r <- corr.test(na.approx(NAFB), Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(na.approx(NAFB), Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Laws with Native and Foreign-born") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Native and Foreign-born
|
Correlation, Significant
|
|
Past Laws 2yr
|
Past Laws 1yr
|
Current Laws
|
Future Laws 1yr
|
Future Laws 2yr
|
Native Born
|
0.493, Yes
|
0.504, Yes
|
0.504, Yes
|
0.505, Yes
|
0.498, Yes
|
Foreign Born
|
0.424, Yes
|
0.433, Yes
|
0.434, Yes
|
0.434, Yes
|
0.425, Yes
|
The correlations with the number of immigration laws passed and Foreign-born Country of Origin are higher for Asia, Oceania, and Latin America which have correlations between \(\tau=0.49\) and \(\tau=0.51\). The strongest of these correlations are with the number of laws passed one year in the future. The correlations for Africa are around \(\tau=0.38\) and exhibit the strongest relationship with laws passed one year in the future. The same pattern holds for all these demographic factors and laws passed one year prior. The correlations of Europe and North America with immigration laws are between \(\tau=0.18\) and \(\tau=0.27\) with Europe on the lower end. The strongest of these correlations are associated with the number of laws passed two years earlier. The same pattern holds for these two demographic factors and laws passed two years prior. All of these correlations are significant at \(\alpha=0.05\). Increases in the foreign-born population from various regions are associated with future increases in the number of immigration laws; and the number of immigration laws passed is associated with future increases in the foreign-born population from various regions. There exists a bidirectional relationship between immigration laws and demographics.
r <- corr.test(na.approx(FBCO), Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(na.approx(FBCO), Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Laws with Foreign-born Country of Origin") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Foreign-born Country of Origin
|
Correlation, Significant
|
|
Past Laws 2yr
|
Past Laws 1yr
|
Current Laws
|
Future Laws 1yr
|
Future Laws 2yr
|
Europe
|
0.221, Yes
|
0.218, Yes
|
0.206, Yes
|
0.195, Yes
|
0.182, Yes
|
Asia
|
0.493, Yes
|
0.504, Yes
|
0.504, Yes
|
0.505, Yes
|
0.498, Yes
|
Africa
|
0.353, Yes
|
0.367, Yes
|
0.383, Yes
|
0.398, Yes
|
0.404, Yes
|
Oceania
|
0.495, Yes
|
0.506, Yes
|
0.507, Yes
|
0.507, Yes
|
0.501, Yes
|
Latin Am
|
0.493, Yes
|
0.504, Yes
|
0.504, Yes
|
0.505, Yes
|
0.498, Yes
|
North Am
|
0.267, Yes
|
0.267, Yes
|
0.258, Yes
|
0.251, Yes
|
0.237, Yes
|
Legislation and Economic Factors
The Kendall correlation between the number of new, one-year old, two-year old, and three-year old immigration laws and economic factors is examined next. The correlations are higher for employment related Visas than they are for non-employment related Visas. Although these correlations are larger, they are not very high. The correlations for employment related Visas are between \(\tau=0.115\) and \(\tau=0.315\), the largest correlations found two years after immigration laws are passed. Yet none of these correlations are significant at \(\alpha=0.05\).
r <- corr.test(Visas, Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(Visas, Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Laws with Immigration Visas") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Immigration Visas
|
Correlation, Significant
|
|
Past Laws 2yr
|
Past Laws 1yr
|
Current Laws
|
Future Laws 1yr
|
Future Laws 2yr
|
IV Emp
|
0.126, No
|
0.167, No
|
0.160, No
|
0.163, No
|
0.315, No
|
IV NonEmp
|
0.113, No
|
0.091, No
|
0.115, No
|
0.062, No
|
0.194, No
|
NIV Emp
|
0.115, No
|
0.150, No
|
0.189, No
|
0.237, No
|
0.305, No
|
NIV NonEmp
|
0.106, No
|
0.078, No
|
0.133, No
|
0.052, No
|
0.170, No
|
Enforcement actions have low correlations with the number of immigration laws passed. The correlations are fairly constant across lags ranging between \(\tau=0.167\) and \(\tau=0.238\). None of these correlations are significant at \(\alpha=0.05\).
r <- corr.test(Enforce, Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(Enforce, Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Laws with Immigration Enforcement Actions") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with Immigration Enforcement Actions
|
Correlation, Significant
|
|
Past Laws 2yr
|
Past Laws 1yr
|
Current Laws
|
Future Laws 1yr
|
Future Laws 2yr
|
Removed
|
0.167, No
|
0.191, No
|
0.209, No
|
0.223, No
|
0.214, No
|
Apprehended
|
0.199, No
|
0.184, No
|
0.207, No
|
0.213, No
|
0.238, No
|
Returned
|
0.220, No
|
0.195, No
|
0.215, No
|
0.205, No
|
0.215, No
|
The largest absolute correlations with the number of immigration laws passed and GDP by Industry can be found two years after immigration laws are passed. Just like with Visas however, although these correlations are larger, they are not very high. The correlations between immigration laws and GDP by Industry range between \(\tau=|0.02|\) and \(\tau=|0.305|\); and none significant at \(\alpha=0.05\).
r <- corr.test(Economy, Legislation, use="complete.obs", method="kendall")$r
p <- corr.test(Economy, Legislation, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Laws with the Economy") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Laws with the Economy
|
Correlation, Significant
|
|
Past Laws 2yr
|
Past Laws 1yr
|
Current Laws
|
Future Laws 1yr
|
Future Laws 2yr
|
Agriculture
|
-0.067, No
|
0.018, No
|
0.086, No
|
0.160, No
|
0.199, No
|
Mining
|
-0.037, No
|
0.073, No
|
0.232, No
|
0.201, No
|
0.270, No
|
Utilities
|
0.017, No
|
0.108, No
|
0.116, No
|
0.191, No
|
0.285, No
|
Construction
|
-0.002, No
|
0.013, No
|
0.076, No
|
0.181, No
|
0.260, No
|
Manufacturing
|
-0.057, No
|
0.013, No
|
0.136, No
|
0.155, No
|
0.290, No
|
Wholesale
|
-0.082, No
|
-0.028, No
|
0.101, No
|
0.150, No
|
0.290, No
|
Retail
|
-0.077, No
|
-0.028, No
|
0.091, No
|
0.145, No
|
0.295, No
|
Transportation
|
-0.077, No
|
-0.028, No
|
0.121, No
|
0.165, No
|
0.280, No
|
Information
|
-0.087, No
|
-0.023, No
|
0.096, No
|
0.160, No
|
0.290, No
|
FIRE
|
-0.082, No
|
-0.013, No
|
0.096, No
|
0.160, No
|
0.270, No
|
Service
|
-0.087, No
|
-0.023, No
|
0.101, No
|
0.160, No
|
0.285, No
|
Education
|
-0.092, No
|
-0.023, No
|
0.091, No
|
0.170, No
|
0.290, No
|
Entertainment
|
-0.087, No
|
-0.023, No
|
0.101, No
|
0.160, No
|
0.285, No
|
Other Industries
|
-0.082, No
|
-0.023, No
|
0.096, No
|
0.165, No
|
0.285, No
|
Govt Federal
|
-0.027, No
|
0.013, No
|
0.131, No
|
0.186, No
|
0.305, No
|
Govt StateLoc
|
-0.092, No
|
-0.023, No
|
0.091, No
|
0.170, No
|
0.290, No
|
Demographic and Economic Factors
As previously mentioned, Kendall correlations between the economy and the number of immigration laws passed are statistically insignificant. This is not true for Kendall correlations between industries in the economy and many other factors. Correlations between industries in the economy, every race, and Hispanic origin are all high and statistically significant at \(\alpha=0.05\). Income for every industry increases with increases these racial and ethnic subsections of the population.
r <- round(corr.test(Economy, na.approx(RAHO), use="complete.obs", method="kendall")$r, 5)
p <- corr.test(Economy, na.approx(RAHO), use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 3), nsmall = 3), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Economy with Race and Hispanic Origin") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Race and Hispanic Origin
|
Correlation, Significant
|
|
White
|
Black
|
Native Am
|
Asian Pac
|
Other Race
|
Hispanic
|
Agriculture
|
0.834, Yes
|
0.830, Yes
|
0.595, Yes
|
0.830, Yes
|
0.660, Yes
|
0.830, Yes
|
Mining
|
0.770, Yes
|
0.766, Yes
|
0.632, Yes
|
0.766, Yes
|
0.669, Yes
|
0.766, Yes
|
Utilities
|
0.701, Yes
|
0.697, Yes
|
0.738, Yes
|
0.697, Yes
|
0.747, Yes
|
0.697, Yes
|
Construction
|
0.793, Yes
|
0.798, Yes
|
0.756, Yes
|
0.798, Yes
|
0.830, Yes
|
0.798, Yes
|
Manufacturing
|
0.903, Yes
|
0.899, Yes
|
0.664, Yes
|
0.899, Yes
|
0.729, Yes
|
0.899, Yes
|
Wholesale
|
0.963, Yes
|
0.968, Yes
|
0.724, Yes
|
0.968, Yes
|
0.798, Yes
|
0.968, Yes
|
Retail
|
0.963, Yes
|
0.968, Yes
|
0.724, Yes
|
0.968, Yes
|
0.798, Yes
|
0.968, Yes
|
Transportation
|
0.963, Yes
|
0.959, Yes
|
0.715, Yes
|
0.959, Yes
|
0.789, Yes
|
0.959, Yes
|
Information
|
0.986, Yes
|
0.991, Yes
|
0.747, Yes
|
0.991, Yes
|
0.821, Yes
|
0.991, Yes
|
FIRE
|
0.959, Yes
|
0.963, Yes
|
0.729, Yes
|
0.963, Yes
|
0.802, Yes
|
0.963, Yes
|
Service
|
0.982, Yes
|
0.986, Yes
|
0.743, Yes
|
0.986, Yes
|
0.816, Yes
|
0.986, Yes
|
Education
|
0.995, Yes
|
1.000, Yes
|
0.756, Yes
|
1.000, Yes
|
0.830, Yes
|
1.000, Yes
|
Entertainment
|
0.982, Yes
|
0.986, Yes
|
0.743, Yes
|
0.986, Yes
|
0.816, Yes
|
0.986, Yes
|
Other Industries
|
0.977, Yes
|
0.982, Yes
|
0.738, Yes
|
0.982, Yes
|
0.811, Yes
|
0.982, Yes
|
Govt Federal
|
0.931, Yes
|
0.936, Yes
|
0.747, Yes
|
0.936, Yes
|
0.802, Yes
|
0.936, Yes
|
Govt StateLoc
|
0.995, Yes
|
1.000, Yes
|
0.756, Yes
|
1.000, Yes
|
0.830, Yes
|
1.000, Yes
|
The same is true for the Native and Foreign-born population and every industry. Correlations are high and statistically significant at \(\alpha=0.05\). Income for every industry increases with increases in native and foreign-born subsections of the population.
r <- round(corr.test(Economy, na.approx(NAFB), use="complete.obs", method="kendall")$r, 5)
p <- corr.test(Economy, na.approx(NAFB), use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Economy with Native and Foreign-born") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Native and Foreign-born
|
Correlation, Significant
|
|
Native Born
|
Foreign Born
|
Agriculture
|
0.83, Yes
|
0.83, Yes
|
Mining
|
0.77, Yes
|
0.77, Yes
|
Utilities
|
0.70, Yes
|
0.70, Yes
|
Construction
|
0.80, Yes
|
0.80, Yes
|
Manufacturing
|
0.90, Yes
|
0.90, Yes
|
Wholesale
|
0.97, Yes
|
0.97, Yes
|
Retail
|
0.97, Yes
|
0.97, Yes
|
Transportation
|
0.96, Yes
|
0.96, Yes
|
Information
|
0.99, Yes
|
0.99, Yes
|
FIRE
|
0.96, Yes
|
0.96, Yes
|
Service
|
0.99, Yes
|
0.99, Yes
|
Education
|
1.00, Yes
|
1.00, Yes
|
Entertainment
|
0.99, Yes
|
0.99, Yes
|
Other Industries
|
0.98, Yes
|
0.98, Yes
|
Govt Federal
|
0.94, Yes
|
0.94, Yes
|
Govt StateLoc
|
1.00, Yes
|
1.00, Yes
|
Country of origin does not show the same homogeneity in statistical significance with every sector the economy like other demographic factors however. At an \(\alpha=0.05\) Asia, Africa, Oceania, and Latin America each have high and statistically significant correlations with every industry, but Europe and North America both have statistically insignificant correlations with every industry. Income for every industry increases with increases in the foreign-born population from Asia, Africa, Oceania, and Latin America.
r <- round(corr.test(Economy, na.approx(FBCO), use="complete.obs", method="kendall")$r, 5)
p <- corr.test(Economy, na.approx(FBCO), use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Economy with Foreign-born Country of Origin") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Foreign-born Country of Origin
|
Correlation, Significant
|
|
Europe
|
Asia
|
Africa
|
Oceania
|
Latin Am
|
North Am
|
Agriculture
|
0.24, No
|
0.83, Yes
|
0.83, Yes
|
0.83, Yes
|
0.83, Yes
|
0.30, No
|
Mining
|
0.30, No
|
0.77, Yes
|
0.77, Yes
|
0.76, Yes
|
0.77, Yes
|
0.31, No
|
Utilities
|
0.46, No
|
0.70, Yes
|
0.70, Yes
|
0.69, Yes
|
0.70, Yes
|
0.36, No
|
Construction
|
0.43, No
|
0.80, Yes
|
0.80, Yes
|
0.79, Yes
|
0.80, Yes
|
0.41, No
|
Manufacturing
|
0.34, No
|
0.90, Yes
|
0.90, Yes
|
0.89, Yes
|
0.90, Yes
|
0.41, No
|
Wholesale
|
0.34, No
|
0.97, Yes
|
0.97, Yes
|
0.96, Yes
|
0.97, Yes
|
0.38, No
|
Retail
|
0.34, No
|
0.97, Yes
|
0.97, Yes
|
0.96, Yes
|
0.97, Yes
|
0.38, No
|
Transportation
|
0.34, No
|
0.96, Yes
|
0.96, Yes
|
0.95, Yes
|
0.96, Yes
|
0.39, No
|
Information
|
0.31, No
|
0.99, Yes
|
0.99, Yes
|
0.99, Yes
|
0.99, Yes
|
0.36, No
|
FIRE
|
0.33, No
|
0.96, Yes
|
0.96, Yes
|
0.96, Yes
|
0.96, Yes
|
0.39, No
|
Service
|
0.32, No
|
0.99, Yes
|
0.99, Yes
|
0.98, Yes
|
0.99, Yes
|
0.37, No
|
Education
|
0.31, No
|
1.00, Yes
|
1.00, Yes
|
1.00, Yes
|
1.00, Yes
|
0.35, No
|
Entertainment
|
0.32, No
|
0.99, Yes
|
0.99, Yes
|
0.98, Yes
|
0.99, Yes
|
0.37, No
|
Other Industries
|
0.31, No
|
0.98, Yes
|
0.98, Yes
|
0.98, Yes
|
0.98, Yes
|
0.36, No
|
Govt Federal
|
0.36, No
|
0.94, Yes
|
0.94, Yes
|
0.94, Yes
|
0.94, Yes
|
0.34, No
|
Govt StateLoc
|
0.31, No
|
1.00, Yes
|
1.00, Yes
|
1.00, Yes
|
1.00, Yes
|
0.35, No
|
There is also a dichotomy between Immigrant and Nonimmigrant Visas and their relationship with the economy. At an \(\alpha=0.05\), employment-related Nonimmigrant Visas have statistically significant correlations with every industry, but employment related Immigrant Visas only have statistically significant correlations with Utilities. Non-employment related Nonimmigrant and Immigrant Visas do not have statistically significant correlations with any industry. Income for every industry increases with increases in employment-related Nonimmigrant Visas. Income for Utilities increases with increases in all employment-related Immigrant Visas. This relationship between employment-related Immigrant Visas and Utilities could also be a statistical anomaly. Within the context of statistical tests, at least one in twenty false positives are expected are a 95% confidence level.
r <- corr.test(Economy, Visas, use="complete.obs", method="kendall")$r
p <- corr.test(Economy, Visas, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Economy with Immigration Visas") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Immigration Visas
|
Correlation, Significant
|
|
IV Emp
|
IV NonEmp
|
NIV Emp
|
NIV NonEmp
|
Agriculture
|
0.43, No
|
0.20, No
|
0.81, Yes
|
0.20, No
|
Mining
|
0.48, No
|
0.31, No
|
0.76, Yes
|
0.29, No
|
Utilities
|
0.63, Yes
|
0.30, No
|
0.72, Yes
|
0.24, No
|
Construction
|
0.48, No
|
0.30, No
|
0.78, Yes
|
0.25, No
|
Manufacturing
|
0.45, No
|
0.23, No
|
0.89, Yes
|
0.21, No
|
Wholesale
|
0.43, No
|
0.28, No
|
0.95, Yes
|
0.26, No
|
Retail
|
0.42, No
|
0.28, No
|
0.95, Yes
|
0.26, No
|
Transportation
|
0.43, No
|
0.27, No
|
0.94, Yes
|
0.26, No
|
Information
|
0.42, No
|
0.28, No
|
0.96, Yes
|
0.27, No
|
FIRE
|
0.44, No
|
0.28, No
|
0.94, Yes
|
0.27, No
|
Service
|
0.43, No
|
0.29, No
|
0.96, Yes
|
0.27, No
|
Education
|
0.41, No
|
0.29, No
|
0.95, Yes
|
0.28, No
|
Entertainment
|
0.43, No
|
0.29, No
|
0.96, Yes
|
0.27, No
|
Other Industries
|
0.42, No
|
0.28, No
|
0.95, Yes
|
0.27, No
|
Govt Federal
|
0.39, No
|
0.30, No
|
0.91, Yes
|
0.29, No
|
Govt StateLoc
|
0.41, No
|
0.29, No
|
0.95, Yes
|
0.28, No
|
Enforcement actions also have mixed results. Apprehensions and Returns have statistically insignificant correlations with every industry in the economy, but Removals have strong statistically significant positive correlations with every industry in the economy at \(\alpha=0.05\). Income for every industry increases with increases in Removals. As stated earlier, Removals are compulsory actions based on a court order and Returns represent immigrants stopped (without a court order) from entering at the borders.
r <- corr.test(Economy, Enforce, use="complete.obs", method="kendall")$r
p <- corr.test(Economy, Enforce, use="complete.obs", method="kendall")$p < 0.05
table_df <- matrix(paste(format(round(r, 2), nsmall = 2), ifelse(p, "Yes", "No"), sep=", "),
nrow=nrow(r), dimnames=dimnames(r) )
colnames(table_df) <- gsub('_', ' ', colnames(table_df))
rownames(table_df) <- gsub('_', ' ', rownames(table_df))
kable(table_df, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Economy with Immigration Enforcement") %>%
add_header_above(c(" ", "Correlation, Significant" = ncol(table_df))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Economy with Immigration Enforcement
|
Correlation, Significant
|
|
Removed
|
Apprehended
|
Returned
|
Agriculture
|
0.81, Yes
|
-0.39, No
|
-0.41, No
|
Mining
|
0.80, Yes
|
-0.35, No
|
-0.39, No
|
Utilities
|
0.67, Yes
|
-0.19, No
|
-0.22, No
|
Construction
|
0.71, Yes
|
-0.27, No
|
-0.31, No
|
Manufacturing
|
0.87, Yes
|
-0.33, No
|
-0.37, No
|
Wholesale
|
0.85, Yes
|
-0.38, No
|
-0.43, No
|
Retail
|
0.85, Yes
|
-0.39, No
|
-0.43, No
|
Transportation
|
0.86, Yes
|
-0.38, No
|
-0.42, No
|
Information
|
0.88, Yes
|
-0.40, No
|
-0.45, No
|
FIRE
|
0.85, Yes
|
-0.38, No
|
-0.43, No
|
Service
|
0.87, Yes
|
-0.40, No
|
-0.45, No
|
Education
|
0.89, Yes
|
-0.41, No
|
-0.46, No
|
Entertainment
|
0.87, Yes
|
-0.40, No
|
-0.45, No
|
Other Industries
|
0.87, Yes
|
-0.40, No
|
-0.45, No
|
Govt Federal
|
0.90, Yes
|
-0.38, No
|
-0.41, No
|
Govt StateLoc
|
0.89, Yes
|
-0.41, No
|
-0.46, No
|
The largest statistically significant correlations are between demographics factors and the number of immigration laws passed one year later. The difference between all the lagged correlations is not very large however, it averages around \(\tau \pm 0.02\).
Causality Analysis
\[Y_{ t }=\left( a_{ 0 }+a_{ 1 }Y_{ t-1 }+\cdots +a_{ p }Y_{ t-p } \right) +\left( b_{ 1 }X_{ t-1 }+\cdots +b_{ p }X_{ t-p } \right) +u_{ t } \tag{Y-Consequence}\]\[X_{ t }=\left( c_{ 0 }+c_{ 1 }X_{ t-1 }+\cdots +c_{ p }X_{ t-p } \right) +\left( d_{ 1 }Y_{ t-1 }+\cdots +d_{ p }Y_{ t-p } \right) +v_{ t } \tag{Y-Antecedent}\]
Granger Causality tests are useful for evaluating time series pairs for causality such that “\(X\) is said to Granger-cause \(Y\) if \(Y\) can be better predicted using the histories of both \(X\) and \(Y\) than it can by using the history of \(Y\) alone.” Essentially, the test checks whether one variable is useful in predicting the other taking lagging of both variables into consideration. In terms of the above two equations, if in the \(Y\)-Consequence equation \(H_{ A }: \exists b_{ i } \neq 0\) holds true, then \(X\) is a Granger-cause of \(Y\). If in the \(Y\)-Antecedent equation \(H_{ A }: \exists d_{ i } \neq 0\) holds true, then \(Y\) is a Granger-cause of \(X\). The test has limitations in multivariate cases and does not rule out confounding variables, but it remains one of the most useful tools for analyzing causality in model building. If one variable could be used to predict future change in the other, then it is reasonable to build a regression model that uses the predictor variable to explain future changes in the other variable. To conduct the test effectively, the maximum order of integration \(d\) for the time series group must first be determined.
Correlated <- data.frame(Laws, RAHO, NAFB, FBCO)
Arima_Orders <- data.frame(matrix(NA, nrow = ncol(Correlated), ncol = 5))
for (i in 1:ncol(Correlated)){
ts_var <- na.approx(Correlated[, i])
adf <- tseries::adf.test(ts_var)$p.value < 0.05
Arima_Orders[i, 1] <- ifelse(adf, "Yes", "No")
l <- BoxCox.lambda(ts_var)
Arima_Orders[i, 2] <- format(round(l, 3), nsmall = 2)
aa <- auto.arima(ts_var, stepwise=F, approximation=F, d=ifelse(adf, 0, NA), lambda=l)
Arima_Orders[i, 3] <- aa$arma[1]
Arima_Orders[i, 4] <- aa$arma[6]
Arima_Orders[i, 5] <- aa$arma[2]
}
colnames(Arima_Orders) <- c("Stationary", "Box-Cox Transform", "p", "d", "q")
rownames(Arima_Orders) <- gsub('_', ' ', colnames(Correlated))
kable(Arima_Orders, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Time Series ARIMA Modeling") %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Time Series ARIMA Modeling
|
Stationary
|
Box-Cox Transform
|
p
|
d
|
q
|
Laws
|
Yes
|
0.274
|
5
|
0
|
0
|
White
|
No
|
0.374
|
0
|
2
|
1
|
Black
|
No
|
0.22
|
1
|
1
|
0
|
Native Am
|
Yes
|
-0.366
|
5
|
0
|
0
|
Asian Pac
|
No
|
0.255
|
1
|
2
|
2
|
Other Race
|
No
|
0.255
|
1
|
2
|
2
|
Hispanic
|
Yes
|
0.081
|
0
|
0
|
0
|
Native Born
|
No
|
0.302
|
1
|
2
|
1
|
Foreign Born
|
No
|
-0.004
|
0
|
2
|
0
|
Europe
|
No
|
0.217
|
2
|
2
|
3
|
Asia
|
No
|
0.12
|
1
|
1
|
1
|
Africa
|
No
|
0.44
|
1
|
2
|
2
|
Oceania
|
No
|
-0.019
|
4
|
2
|
1
|
Latin Am
|
No
|
0.113
|
1
|
1
|
1
|
North Am
|
No
|
0.54
|
2
|
2
|
2
|
The auto.arima()
function in R
chooses ARIMA models automatically using a variation of the Hyndman and Khandakar algorithm which combines unit root tests, minimization of the AICc, and MLEs to obtain an ARIMA model. The results suggest that the maximum order of integration is \(d = 2\). Using this maximum order of integration, tests are then run to check for the existence of Granger Causality.
Causality <- data.frame(matrix(NA, nrow = ncol(Correlated[,-1]), ncol = 2))
for (i in 2:ncol(Correlated)){
for (j in 1:2) {
cause <- as.data.frame(na.approx(Correlated[,ifelse(j==1, 1, i)]))
effect <- as.data.frame(na.approx(Correlated[,ifelse(j==1, i, 1)]))
d <- max(Arima_Orders[,"d"])
gtest <- grangertest(cause, effect, order = d)
f <- format(round(gtest$F[2], 2), nsmall = 2)
p <- gtest$`Pr(>F)`[2]<0.05
Causality[i-1,j] <- paste(f, ifelse(p, "Yes", "No"), sep=", ")
}
}
colnames(Causality) <- c("Number of Laws (Antecedent)",
"Number of Laws (Consequence)")
rownames(Causality) <- gsub('_', ' ', colnames(Correlated[,-1]))
kable(Causality, longtable = T, booktabs = T, row.names=T, linesep = "",
align="c", caption = "Legislation-Demographic Causality Test") %>%
add_header_above(c(" ", "Granger Causality Statistic, Significant" = ncol(Causality))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Legislation-Demographic Causality Test
|
Granger Causality Statistic, Significant
|
|
Number of Laws (Antecedent)
|
Number of Laws (Consequence)
|
White
|
1.61, No
|
12.58, Yes
|
Black
|
0.83, No
|
13.51, Yes
|
Native Am
|
2.68, No
|
8.12, Yes
|
Asian Pac
|
2.85, No
|
5.86, Yes
|
Other Race
|
1.28, No
|
8.47, Yes
|
Hispanic
|
4.17, Yes
|
12.46, Yes
|
Native Born
|
3.94, Yes
|
13.36, Yes
|
Foreign Born
|
0.86, No
|
9.21, Yes
|
Europe
|
1.10, No
|
1.22, No
|
Asia
|
0.33, No
|
6.47, Yes
|
Africa
|
5.02, Yes
|
3.58, Yes
|
Oceania
|
1.78, No
|
9.98, Yes
|
Latin Am
|
5.81, Yes
|
11.49, Yes
|
North Am
|
2.97, No
|
2.27, No
|
The test results indicate that changes in most demographics exhibit the characteristics necessary for being a Granger cause of the number of immigration laws are passed. The exceptions are foreign-born of European and North American origin. Granger Causality does not imply true causality, but the results do indicate that changes in the demographic variables displaying Granger Causality are useful predictors for forecasting future changes in number of immigration laws enacted. The bidirectional causality relationship with some variables (Hispanic, Native Born, Africa, and Latin America) suggests a feedback loop which implies that changes in the number of immigration laws enacted is also a useful predictor for forecasting future changes in those four demographic elements.
Model Selection
Based on the strength of the one-year correlations and bidirectional Granger causality for some of the correlated factors, a few models will be constructed and tested. First, a model that attempts to explain the number of laws that will be passed one year in the future based on changes to multiple demographic factors will be constructed. Then, four smaller models that attempt to explain changes to the Hispanic, Native born, foreign-born from Africa, and foreign-born from Latin America population segments based on the number of laws passed one year prior will be constructed. Given the oddities in the distributions of some of these time series’, multiple regression model types are tuned and examined to see which model type is most suited for these data.
- Linear Regression Models
- Robust Linear Regression (RLM)
- Principal Component Regression (PCR)
- Partial Least Squares (PLS)
- Elastic Net Regression (ENET)
- Nonlinear Regression Models
- Artificial Neural Networks (ANN)
- Multivariate Adaptive Regression Splines (MARS)
- Support Vector Machines (SVM)
- \(K\)-Nearest Neighbors (KNN)
- Tree-Based Regression Models
- Classification and Regression Tree (CART)
- Random Forest (RF)
- Stochastic Gradient Boosting (SGB)
- Rule-Based Cubist (CUBE)
model_vars <- na.approx(Correlated[,c(1,3:14)])
set.seed(698)
rows_train <- createDataPartition(model_vars[,1], p=0.75, list=F)
X_train <- model_vars[rows_train, -1]
X_test <- model_vars[-rows_train, -1]
Y_train <- model_vars[rows_train, 1]
Y_test <- model_vars[-rows_train, 1]
set.seed(698)
ctrl <- trainControl(method = "cv", number = 10)
Linear Regression Models
set.seed(698)
tune01 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "rlm", trControl = ctrl)
plot(tune01, main=tune01$modelInfo$label)

set.seed(698)
tune02 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "pcr", trControl = ctrl, tuneLength = 25)
plot(tune02, main=tune02$modelInfo$label)

set.seed(698)
tune03 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "pls", trControl = ctrl, tuneLength = 25)
plot(tune03, main=tune03$modelInfo$label)

set.seed(698)
tg <- expand.grid(lambda = c(0, 0.05, .1), fraction = seq(0.05, 1, length = 25))
tune04 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "enet", trControl = ctrl, tuneGrid = tg)
plot(tune04, main=tune04$modelInfo$label)

Nonlinear Regression Models
set.seed(698)
tg <- expand.grid(.decay = c(0, 0.01, .1), .size = c(1:10), .bag = F)
tune05 <- train(x = lag(X_train, k=1), y = Y_train,
method = "avNNet", tuneGrid = tg, trControl = ctrl, linout = T,
preProcess = c("BoxCox","center","scale","knnImpute"),
trace = F, MaxNWts = 10 * (ncol(X_train) + 1) + 10 + 1, maxit = 500)
plot(tune05, main=tune05$modelInfo$label)

set.seed(698)
tg <- expand.grid(degree = c(1:2), nprune = c(2:10))
tune06 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "earth", tuneGrid = tg, trControl = ctrl)
plot(tune06, main=tune06$modelInfo$label)

set.seed(698)
tg <- expand.grid(C=c(0.01,0.05,0.1), degree=c(1,2), scale=c(0.25,0.5,1))
tune07 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "svmPoly", tuneGrid = tg, trControl = ctrl)
plot(tune07, main=tune07$modelInfo$label)

set.seed(698)
tg <- data.frame(.k = 1:20)
tune08 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "knn", tuneGrid = tg, trControl = trainControl(method = "cv"))
plot(tune08, main=tune08$modelInfo$label)

Tree-based Regression Models
set.seed(698)
tg <- expand.grid(maxdepth= seq(1,10,by=1))
tune09 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "rpart2", tuneGrid = tg, trControl = ctrl)
plot(tune09, main=tune09$modelInfo$label)

set.seed(698)
P <- ncol(X_train)
tg <- expand.grid(mtry=seq(2, P, by = floor(P/5)))
tune10 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "rf", tuneGrid = tg, trControl = ctrl)
plot(tune10, main=tune10$modelInfo$label)

set.seed(698)
tg <- expand.grid(interaction.depth=seq(1,6,by=1), n.trees=c(25,50,100,200),
shrinkage=c(0.01,0.05,0.1,0.2), n.minobsinnode=10)
tune11 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "gbm", tuneGrid = tg, trControl = ctrl, verbose=F)
plot(tune11, main=tune11$modelInfo$label)

set.seed(698)
tg <- expand.grid(committees = c(1,5,10,20,50,100), neighbors = c(0,1,3,5,7))
tune12 <- train(x = lag(X_train, k=1), y = Y_train,
preProcess = c("BoxCox","center","scale","knnImpute"),
method = "cubist", tuneGrid = tg, trControl = ctrl)
plot(tune12, main=tune12$modelInfo$label)

Model Comparison
All the models perform almost equivalently on these data. As a matter of fact, the optimal RMSE, \(R^2\), and MAE training set resampling performance metrics are associated with different models depending on the random seed. The need to validate training set resampling results–which are highly optimistic as a result of the repeated sampling– with a test set is highlighted by cases such as this where the resampling performance metrics are nearly identical. Based on the RMSE of the training set resampling the respectively optimal linear, nonlinear, and tree-based models based are the Partial Least Squares (PLS), \(K\)-Nearest Neighbors (KNN), and Stochastic Gradient Boosting (SGB) models. These training set resampling rankings hold for the PLS and KNN models after running the test set validation. Since the model performance is essentially equivalent and the goal of this modeling process is explanation of relationships between variables rather than prediction, the non-covariance based KNN model which cannot be cleanly summarized will not be used. The focus will be on covariance based linear models.
fits <- list(RLM=tune01, PCR=tune02, PLS=tune03, ENET=tune04, ANN=tune05, MARS=tune06,
SVM=tune07, KNN=tune08, CART=tune09, RF=tune10, SGB=tune11, CUBE=tune12)
bwplot(resamples(fits))

metrics <- function(tune) {
RMSE = min(tune$results$RMSE)
Rsquared = max(tune$results$Rsquared)
MAE = min(tune$results$MAE)
return(cbind(RMSE, Rsquared, MAE)) }
resampling <- data.frame(rbind(metrics(tune01), metrics(tune02),
metrics(tune03), metrics(tune04), metrics(tune05), metrics(tune06),
metrics(tune07), metrics(tune08), metrics(tune09), metrics(tune10),
metrics(tune11), metrics(tune12)), row.names = c("RLM","PCR","PLS",
"ENET", "ANN", "MARS", "SVM", "KNN", "CART", "RF", "SGB", "CUBE"))
validation <- data.frame(row.names = c("RLM","PCR","PLS", "ENET",
"ANN", "MARS", "SVM", "KNN", "CART", "RF", "SGB", "CUBE"), rbind(
postResample(pred = predict(tune01, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune02, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune03, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune04, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune05, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune06, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune07, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune08, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune09, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune10, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune11, newdata = X_test), obs = Y_test),
postResample(pred = predict(tune12, newdata = X_test), obs = Y_test)))
kable(cbind(resampling, validation), longtable = T, booktabs = T, row.names=T, linesep = "",
caption = "Performance Metrics") %>%
add_header_above(c("","Training Set Resampling" = 3, "Test Set Validation" = 3)) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Performance Metrics
|
Training Set Resampling
|
Test Set Validation
|
|
RMSE
|
Rsquared
|
MAE
|
RMSE
|
Rsquared
|
MAE
|
RLM
|
1.101253
|
0.4215269
|
0.7128411
|
1.717429
|
0.3141630
|
0.9216199
|
PCR
|
1.087937
|
0.4378742
|
0.7331757
|
1.673304
|
0.3429144
|
0.9411131
|
PLS
|
1.086624
|
0.4257641
|
0.7380014
|
1.672647
|
0.3426580
|
0.9397111
|
ENET
|
1.090662
|
0.4184976
|
0.7339712
|
1.697809
|
0.3291550
|
0.9539869
|
ANN
|
1.134243
|
0.3915584
|
0.7437596
|
1.743432
|
0.2716640
|
0.9865738
|
MARS
|
1.116175
|
0.3960809
|
0.7643166
|
1.683975
|
0.3233568
|
0.9352765
|
SVM
|
1.104148
|
0.4127897
|
0.7148267
|
1.743417
|
0.2898321
|
0.8877538
|
KNN
|
1.057420
|
0.4609521
|
0.7081444
|
1.633050
|
0.3684154
|
0.9598214
|
CART
|
1.127081
|
0.3890166
|
0.7731064
|
1.826424
|
0.1877542
|
1.0078656
|
RF
|
1.242846
|
0.3149483
|
0.8172722
|
1.489037
|
0.4894702
|
1.0142196
|
SGB
|
1.071784
|
0.4307816
|
0.7269328
|
1.699391
|
0.3222708
|
0.9427416
|
CUBE
|
1.116010
|
0.3955091
|
0.7271977
|
1.603782
|
0.3949694
|
0.9653687
|
Coefficient Analysis
LagFwd <- data.frame(stats::lag(na.approx(data698[,1]), k=-1))
LagBwd <- data.frame(stats::lag(na.approx(data698[,1]), k=1))
X <- data.frame(na.approx(data698[,c(3:11,13:16)]))
X[, 2] <- X[, 2] + X[, 3]
colnames(X)[2] <- gsub("_.*", "", colnames(X)[2])
X <- X[, -3]
prepro <- preProcess(X, method=c("BoxCox", "center", "scale", "knnImpute"))
demographics <- predict(prepro, X)
LawsEffect <- data.frame(LagFwd, demographics)
colnames(LawsEffect) <- c("LawsFuture", colnames(demographics))
LawsCause <- data.frame(LagBwd, demographics)
colnames(LawsCause) <- c("LawsPast", colnames(demographics))
Partial Least Squares (PLS)
The first model attempts to explain the number of laws that will be passed one year in the future based on changes to multiple demographic factors. There exists a very high level of multicollinearity in these demographic factors however. In fact, attempting to remove highly correlated predictors removes nine of the twelve variables leaving only foreign-born from Africa, Native-Americans, and Whites. These were the three variables with very unusual distributions due to slavery, genocide, and admissions being based on race until around 1950. Using a PLS model–which is essentially supervised Principal Component Analysis– is a regression technique that works well when a high level of multicollinearity is present. Applying the PLS model with two components (determined by the tuning process) to Box-Cox transformations of these data and examining the residuals for normality with a Shapiro-Wilks test suggests that the PLS model provides an appropriate fit for these data at \(\alpha=0.05\). Examining legislation as a consequence where the cause is demographic factor changes and the effect is the number of laws enacted one year later, the PLS model suggests that the variation in the number of immigration laws enacted is explained in varying amounts by earlier changes in each of the demographic factors. It is important to note that the PLS coefficients reflect the amount of change explained by the demographic factor and not the direction of the change. The direction of change was however, noted in the Correlation Analysis where all were found to have a positive relationship. Therefore, it can be surmised that when there are increases in these demographic elements, the number of immigration laws enacted one year later increases. All of these jackknife variance estimates are statistically significant at \(\alpha=0.05\).
fit1 <- plsr(LawsFuture ~ ., data = LawsEffect, ncomp = 2, validation = "CV", jackknife = T)
shapiro.test(fit1$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit1$residuals
## W = 0.76207, p-value < 2.2e-16
jack <- jack.test(fit1, ncomp = fit1$ncomp, use.mean = T)
model <- data.frame(jack[c(1:3,5)])
model <- model[order(model[,4]>0.05, -abs(model[,1])),]
colnames(model) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|)")
rownames(model) <- gsub('_', ' ', rownames(model))
kable(model, longtable = T, booktabs = T, row.names=T, linesep = "",
caption = "Number of Laws (Consequence)") %>%
add_header_above(c("","Ordered Partial Least Squares (PLS) Coefficients" = ncol(model))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Number of Laws (Consequence)
|
Ordered Partial Least Squares (PLS) Coefficients
|
|
Estimate
|
Std. Error
|
t value
|
Pr(>|t|)
|
Other Race
|
0.1794285
|
0.0598980
|
2.995566
|
0.0150643
|
Native Am
|
0.1733696
|
0.0530547
|
3.267750
|
0.0097176
|
Africa
|
0.1642788
|
0.0325645
|
5.044722
|
0.0006953
|
Asian Pac
|
0.1376798
|
0.0442626
|
3.110526
|
0.0125082
|
Black
|
0.0836869
|
0.0104169
|
8.033746
|
0.0000214
|
Native Born
|
0.0757040
|
0.0084934
|
8.913244
|
0.0000092
|
White
|
0.0737364
|
0.0086038
|
8.570215
|
0.0000127
|
Foreign Born
|
0.0626821
|
0.0106390
|
5.891742
|
0.0002314
|
Hispanic
|
0.0555109
|
0.0092094
|
6.027655
|
0.0001958
|
Oceania
|
0.0480144
|
0.0083247
|
5.767740
|
0.0002702
|
Latin Am
|
0.0434573
|
0.0071833
|
6.049785
|
0.0001905
|
Asia
|
0.0404559
|
0.0070373
|
5.748799
|
0.0002767
|
Robust Linear Model (RLM)
The remaining four models attempt to explain changes to four demographic factors based on the number of laws that were enacted one year prior. Addressing multicollinearity is unnecessary for these small models with one \(X\) variable. There are irregularities in some of the distributions however which suggest the need for a robust method, hence a Robust Linear Model (RLM). Applying the RLM model with bisquare weights and an intercept (determined by the tuning process) to Box-Cox transformations of these data and examining the residuals for normality with a Shapiro-Wilks test suggests that the RLM model provides an appropriate fit for most of these data at \(\alpha=0.05\). The exception is the model for foreign-born from Africa which has a Shapiro-Wilks \(p\)-value of \(0.06\) and an unusual distribution due to the period of slavery. Post-slavery the distribution of the transformed Africa variable is more linear and therefore more suited for an RLM model. Examining legislation as an antecedent where the effect is demographic factor changes and the cause is the number of laws enacted one year prior, the RLM suggests that as the number of immigration laws enacted increases, specific demographic elements change such as the number of Native Born \(\uparrow\), Hispanics \(\uparrow\), and foreign-born from specific countries of origin (Africa \(\uparrow\), Latin America \(\uparrow\)). These relationships are statistically significant at \(\alpha=0.05\).
ctrl <- lmRob.control(weight=c("Bisquare","Bisquare"))
fit2 <- lmRob(Hispanic ~ LawsPast, data = LawsCause, control=ctrl)
fit3 <- lmRob(Native_Born ~ LawsPast, data = LawsCause, control=ctrl)
fit4 <- lmRob(Africa ~ LawsPast, data = LawsCause, control=ctrl)
fit5 <- lmRob(Latin_Am ~ LawsPast, data = LawsCause, control=ctrl)
par(mfrow=c(2, 2), mar = c(0, 0.5, 2, 0), oma = c(0.5, 0.5, 2, 0.5))
plot(LawsCause$Hispanic, type="l", xaxt = "n", yaxt = "n", main="Hispanic", col="blue")
plot(LawsCause$Native_Born, type="l", xaxt = "n", yaxt = "n", main="Native Born", col="blue")
plot(LawsCause$Africa, type="l", xaxt = "n", yaxt = "n", main="Africa", col="blue")
plot(LawsCause$Latin_Am, type="l", xaxt = "n", yaxt = "n", main="Latin America", col="blue")
title("Box-Cox Transformations", outer=TRUE)

shapiro.test(fit2$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit2$residuals
## W = 0.96259, p-value = 1.132e-05
shapiro.test(fit3$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit3$residuals
## W = 0.96916, p-value = 7.518e-05
shapiro.test(fit4$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit4$residuals
## W = 0.98855, p-value = 0.06734
shapiro.test(fit5$residuals)
##
## Shapiro-Wilk normality test
##
## data: fit5$residuals
## W = 0.92388, p-value = 2.025e-09
models <- rbind(Hispanic=summary(fit2)$coefficients[-1,],
Native_Born=summary(fit3)$coefficients[-1,],
Africa=summary(fit4)$coefficients[-1,],
Latin_Am=summary(fit5)$coefficients[-1,])
models <- models[order(models[,1], decreasing = T), ]
rownames(models) <- gsub('_', ' ', rownames(models))
kable(models, longtable = T, booktabs = T, row.names=T, linesep = "",
caption = "Ordered Number of Laws (Antecedent)") %>%
add_header_above(c("","Ordered Robust Linear Model (RLM) Coefficients" = ncol(models))) %>%
kable_styling(latex_options = c("striped", "hold_position", "repeat_header"))
Ordered Number of Laws (Antecedent)
|
Ordered Robust Linear Model (RLM) Coefficients
|
|
Estimate
|
Std. Error
|
t value
|
Pr(>|t|)
|
Native Born
|
0.4700338
|
0.0498572
|
9.427596
|
0
|
Hispanic
|
0.4345160
|
0.0459628
|
9.453639
|
0
|
Latin Am
|
0.3201317
|
0.0369501
|
8.663889
|
0
|
Africa
|
0.3142223
|
0.0481317
|
6.528390
|
0
|