Team members

Group 6

  • Yue Wu
  • YinChia Huang
  • Francesco Ignazio Re

Outline

  • Introduction
  • Data set Visualization & Analysis
  • Statistic Analysis
  • Time Series & Analysis
  • Conclusion

Introduction

Apprehensions at the US-Mexico border have declined to near-historic lows over the last few years. The objective of this report is to give a deeper insight on this change that has been occuring. Through the analysis of the data collected by the U.S. Customs and Border Protection through the years, we intend to shed light on the general trend of this phenomenon, focusing on how factors such as time and place have influenced the given outcome.

Overview of 2010 Apprehensions

The gray dots are Sector "Tucson", which had the most unstable number of apprehensions and was the sector who contributed the most in the apprehensions of 2010.

Overview of 2017 Apprehensions

The pink dots are Sector "Rio Grande Valley" which contributed the most in the apprehensions of 2017. Notably,apprehensions dramatically increased from Sep to Oct.

2010 v.s 2017 Apprehensions by month

The most trafficated months in 2010, such as March, April and May are also the ones with the biggest decline in 2017.

2010 v.s 2017 Apprehensions by sector

The greatest change has occured in Tuscon, the area with the highest number of apprehensions in the 2010, that observed a drop of over the 80% according to the data collected in 2017.

Descriptive Data Analysis

From 2010 to 2017, the U.S. Customs and Border Protection saw an overall 36 percent decrease in apprehensions for illegal entry to the country. The plots show a significant different trend of monthly apprehension changes between the two years. Sectors also exhibit different patterns of changes from each other.

Statistic Analysis(1)

Let's use simple statistical tests to compare the change witnessed in the the sector with the most apprehensions in 2010 and 2017.

2010 max-sum sector: No.8 Tuscon

## 
##  Paired t-test
## 
## data:  as.numeric(ap10[y, ]) and as.numeric(ap17[y, ])
## t = 6.2428, df = 11, p-value = 6.324e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   9363.28 19560.89
## sample estimates:
## mean of the differences 
##                14462.08
t test: p= 6.324e-05 < 0.05
Conclusion: We can reject the null hypothesis that Tuscon has equal means in 2017 from 2010.
Interpretation: Tuscon does play a critical role in the fluctuations of apprehensions between 2010 and in 2017.

2017 max-sum sector: No.6 Rio Grande Valley

## 
##  Paired t-test
## 
## data:  as.numeric(ap10[y, ]) and as.numeric(ap17[y, ])
## t = -2.4601, df = 11, p-value = 0.03167
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12283.0782   -682.9218
## sample estimates:
## mean of the differences 
##                   -6483
t test: p= 0.03167 < 0.05
Conclusion: We can reject the null hypothesis. RGV has different means in 2010 from 2017, with 95% confidence.
Interpretation: RGV plays a critical role in the fluctuations of apprehensions between in 2010 and in 2017, however not as much as the the sector Tuscon.

Statistic Analysis (2)

Testing the hyphotesis for which the three months with apprehension peak in one year are equal in mean of the same months in the other.

Peak in 2010: March, April, May

## 
##  Paired t-test
## 
## data:  as.numeric(xb10) and as.numeric(xb17)
## t = 8.5141, df = 2, p-value = 0.01352
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  20742.45 63125.55
## sample estimates:
## mean of the differences 
##                   41934
t test: p= 0.014 < 0.05
Conclusion: We can reject the null hypothesis. March - May 2010 means are significantly different from those in 2017, with 95% confidence.
Interpretation: The peak in 2010 is significantly different in value from the same months in 2017.

Peak in 2017: October, November, December

## 
##  Paired t-test
## 
## data:  as.numeric(xa10) and as.numeric(xa17)
## t = -3.2966, df = 2, p-value = 0.081
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29127.621   3856.288
## sample estimates:
## mean of the differences 
##               -12635.67
t test: p= 0.081 < 0.05
Conclusion: We cannot reject the null hypothesis. Oct - Dec 2017 means are not significantly different from those in 2010.
Interpretation: The peak in 2017 is not significantly different in value from the same months in 2010. The test result correspond with the narrow boxplot trend during those months. However, the small degrees of freedom makes the test less reliable.

Time series graph

The time series plot shows that from 2000, there has been continuous decline, from a high of over 1.6 million in 2000 to around 300,000 in 2017. Over these years, US policies on immigration control have been rapidly developed, leading to potential correlation with the change in apprehensions.

Time series analysis (1)

Box plot across months will give us a sense on seasonal effect. We can see that the number of apprehension in March has the biggest range and highest median.

Time series analysis (2)

In addition to the overall decline, seasonal fluctuation is another noticeable trend in the plots. A reasonable explanation is that harsh weather decreases attempts of illegal entry while keeping other factors controlled. The seasonal effects cause the changes among months, but plays little role in the rapid changes among years.

Time series analysis (3)

Month plot graph displays the time series plot for each month from 2000 to 2017. The shape of each month is very similar with different magnitudes.

Time series prediction

Using ARIMA Model to predict future apprehensions

Conclusions

The significant decline of border apprehensions since 2000 mainly come from changes in specific sectors, such as Rio Grande Valley, and time of a year, such as March to May. More research on political and economical factors can be done to further explain the causes.

Reference