library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.4
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'purrr' was built under R version 3.6.3
## -- Conflicts ----------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Border Crossing Data as it Relates to COVID-19

With the outbreak of COVID-19, our normal lives have ground to a halt. Countries all around the world establish some kind of shelter-in-place to keep their citizens and those of other countries safe. I was wondering if this had an affect on immigration to the United States.

border_data<-read.csv("Border_Crossing_Entry_Data.csv", header = TRUE)
dim(border_data)
## [1] 355511      7
str(border_data)
## 'data.frame':    355511 obs. of  7 variables:
##  $ Port.Name: Factor w/ 116 levels "Alcan","Alexandria Bay",..: 62 90 96 10 54 17 87 113 36 41 ...
##  $ State    : Factor w/ 15 levels "AK","AZ","CA",..: 2 7 6 14 5 5 13 8 15 15 ...
##  $ Port.Code: int  2603 3426 3803 206 118 115 2307 3312 3013 3020 ...
##  $ Border   : Factor w/ 2 levels "US-Canada Border",..: 2 1 1 1 1 1 2 1 1 1 ...
##  $ Date     : Factor w/ 290 levels "1996-01-01","1996-02-01",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Measure  : Factor w/ 12 levels "Bus Passengers",..: 8 4 2 3 5 8 6 7 12 1 ...
##  $ Value    : int  0 7281 775 0 3879 0 0 0 183 682 ...

I don’t necessarily need to see where exactly people came through, nor do I need the ID number of the checkpoint, so I’ll get rid of those.

border_data<-border_data[-c(1:3)]

I also don’t need all the data from 1996. I’m just trying to look at recent years through today.

border_data$Date<-as.Date(border_data$Date)
border_data<-border_data%>%
  filter(Date >= "2017-01-01")

view(border_data)

That’s better.

Now I want to see the total number of each type of crossing in each month.

transport<-border_data%>%
  group_by(Date, Measure, Border)%>%
  summarise(Value=sum(Value, na.rm = TRUE))

str(transport)
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  912 obs. of  4 variables:
##  $ Date   : Date, format: "2017-01-01" "2017-01-01" ...
##  $ Measure: Factor w/ 12 levels "Bus Passengers",..: 1 1 2 2 3 3 4 4 5 5 ...
##  $ Border : Factor w/ 2 levels "US-Canada Border",..: 1 2 1 2 1 2 1 2 1 2 ...
##  $ Value  : int  84754 148050 4520 14191 13537 3424417 3117590 11736798 1758783 6286179 ...
##  - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame':   456 obs. of  3 variables:
##   ..$ Date   : Date, format: "2017-01-01" ...
##   ..$ Measure: Factor w/ 12 levels "Bus Passengers",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ .rows  :List of 456
##   .. ..$ : int  1 2
##   .. ..$ : int  3 4
##   .. ..$ : int  5 6
##   .. ..$ : int  7 8
##   .. ..$ : int  9 10
##   .. ..$ : int  11 12
##   .. ..$ : int  13 14
##   .. ..$ : int  15 16
##   .. ..$ : int  17 18
##   .. ..$ : int  19 20
##   .. ..$ : int  21 22
##   .. ..$ : int  23 24
##   .. ..$ : int  25 26
##   .. ..$ : int  27 28
##   .. ..$ : int  29 30
##   .. ..$ : int  31 32
##   .. ..$ : int  33 34
##   .. ..$ : int  35 36
##   .. ..$ : int  37 38
##   .. ..$ : int  39 40
##   .. ..$ : int  41 42
##   .. ..$ : int  43 44
##   .. ..$ : int  45 46
##   .. ..$ : int  47 48
##   .. ..$ : int  49 50
##   .. ..$ : int  51 52
##   .. ..$ : int  53 54
##   .. ..$ : int  55 56
##   .. ..$ : int  57 58
##   .. ..$ : int  59 60
##   .. ..$ : int  61 62
##   .. ..$ : int  63 64
##   .. ..$ : int  65 66
##   .. ..$ : int  67 68
##   .. ..$ : int  69 70
##   .. ..$ : int  71 72
##   .. ..$ : int  73 74
##   .. ..$ : int  75 76
##   .. ..$ : int  77 78
##   .. ..$ : int  79 80
##   .. ..$ : int  81 82
##   .. ..$ : int  83 84
##   .. ..$ : int  85 86
##   .. ..$ : int  87 88
##   .. ..$ : int  89 90
##   .. ..$ : int  91 92
##   .. ..$ : int  93 94
##   .. ..$ : int  95 96
##   .. ..$ : int  97 98
##   .. ..$ : int  99 100
##   .. ..$ : int  101 102
##   .. ..$ : int  103 104
##   .. ..$ : int  105 106
##   .. ..$ : int  107 108
##   .. ..$ : int  109 110
##   .. ..$ : int  111 112
##   .. ..$ : int  113 114
##   .. ..$ : int  115 116
##   .. ..$ : int  117 118
##   .. ..$ : int  119 120
##   .. ..$ : int  121 122
##   .. ..$ : int  123 124
##   .. ..$ : int  125 126
##   .. ..$ : int  127 128
##   .. ..$ : int  129 130
##   .. ..$ : int  131 132
##   .. ..$ : int  133 134
##   .. ..$ : int  135 136
##   .. ..$ : int  137 138
##   .. ..$ : int  139 140
##   .. ..$ : int  141 142
##   .. ..$ : int  143 144
##   .. ..$ : int  145 146
##   .. ..$ : int  147 148
##   .. ..$ : int  149 150
##   .. ..$ : int  151 152
##   .. ..$ : int  153 154
##   .. ..$ : int  155 156
##   .. ..$ : int  157 158
##   .. ..$ : int  159 160
##   .. ..$ : int  161 162
##   .. ..$ : int  163 164
##   .. ..$ : int  165 166
##   .. ..$ : int  167 168
##   .. ..$ : int  169 170
##   .. ..$ : int  171 172
##   .. ..$ : int  173 174
##   .. ..$ : int  175 176
##   .. ..$ : int  177 178
##   .. ..$ : int  179 180
##   .. ..$ : int  181 182
##   .. ..$ : int  183 184
##   .. ..$ : int  185 186
##   .. ..$ : int  187 188
##   .. ..$ : int  189 190
##   .. ..$ : int  191 192
##   .. ..$ : int  193 194
##   .. ..$ : int  195 196
##   .. ..$ : int  197 198
##   .. .. [list output truncated]
##   ..- attr(*, ".drop")= logi TRUE

Now, is there a trend that increases or decreases among people crossing the United States’ border?

main<-ggplot(transport, aes(x = Date, y = Value, color = Measure))+
  geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  facet_wrap(transport$Border)
main

Looking at the two visuals, there is an obvious downward trend of personal vehicle crossings into the United States from Mexico, but it is difficult to tell just by looking at the visual if there is any trend in any other method. This is why I print the model to give us a value to our slope so we can determine if there is any significant trend.

model<-lm(Value~Measure+Border+Measure*Border, data = transport)
anova(model)
## Analysis of Variance Table
## 
## Response: Value
##                 Df     Sum Sq    Mean Sq F value    Pr(>F)    
## Measure         11 4.9971e+15 4.5428e+14  5397.6 < 2.2e-16 ***
## Border           1 3.7631e+14 3.7631e+14  4471.2 < 2.2e-16 ***
## Measure:Border  11 1.3067e+15 1.1879e+14  1411.4 < 2.2e-16 ***
## Residuals      888 7.4737e+13 8.4164e+10                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model
## 
## Call:
## lm(formula = Value ~ Measure + Border + Measure * Border, data = transport)
## 
## Coefficients:
##                                               (Intercept)  
##                                                    144909  
##                                              MeasureBuses  
##                                                   -138439  
##                                        MeasurePedestrians  
##                                                   -105962  
##                        MeasurePersonal Vehicle Passengers  
##                                                   4035845  
##                                  MeasurePersonal Vehicles  
##                                                   2060181  
##                              MeasureRail Containers Empty  
##                                                    -82268  
##                               MeasureRail Containers Full  
##                                                      6307  
##                                   MeasureTrain Passengers  
##                                                   -121840  
##                                             MeasureTrains  
##                                                   -142991  
##                             MeasureTruck Containers Empty  
##                                                    -10052  
##                              MeasureTruck Containers Full  
##                                                    220612  
##                                             MeasureTrucks  
##                                                    335516  
##                                    BorderUS-Mexico Border  
##                                                     17160  
##                       MeasureBuses:BorderUS-Mexico Border  
##                                                    -10143  
##                 MeasurePedestrians:BorderUS-Mexico Border  
##                                                   3776921  
## MeasurePersonal Vehicle Passengers:BorderUS-Mexico Border  
##                                                   7562188  
##           MeasurePersonal Vehicles:BorderUS-Mexico Border  
##                                                   4071919  
##       MeasureRail Containers Empty:BorderUS-Mexico Border  
##                                                    -31433  
##        MeasureRail Containers Full:BorderUS-Mexico Border  
##                                                   -127133  
##            MeasureTrain Passengers:BorderUS-Mexico Border  
##                                                    -39397  
##                      MeasureTrains:BorderUS-Mexico Border  
##                                                    -18132  
##      MeasureTruck Containers Empty:BorderUS-Mexico Border  
##                                                      2237  
##       MeasureTruck Containers Full:BorderUS-Mexico Border  
##                                                     -1290  
##                      MeasureTrucks:BorderUS-Mexico Border  
##                                                     24925

The model uses Canada as the base. It seems that there are some more decreasing trends that we can’t see on the visual. About half of the ways of crossing the border saw a downward trend in amount of people using that way to pass through.

Of course, we are still in the early stages of this pandemic, relative to the time period this data represents, so it’s difficult to say now if COVID-19 is actually affecting the amount of people coming into the United States. But, looking closely at the visuals, there are some points that are lower than others as we enter 2020. Could this be a coincidence? Absolutely. But it could also be the start of this downward trend of people coming to the United States. Again, it is difficult to say whether or not these are really significant simply because we don’t have the data for the future, and the pandemic is still relatively new in this data set’s history.