library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.4
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'purrr' was built under R version 3.6.3
## -- Conflicts ----------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
With the outbreak of COVID-19, our normal lives have ground to a halt. Countries all around the world establish some kind of shelter-in-place to keep their citizens and those of other countries safe. I was wondering if this had an affect on immigration to the United States.
border_data<-read.csv("Border_Crossing_Entry_Data.csv", header = TRUE)
dim(border_data)
## [1] 355511 7
str(border_data)
## 'data.frame': 355511 obs. of 7 variables:
## $ Port.Name: Factor w/ 116 levels "Alcan","Alexandria Bay",..: 62 90 96 10 54 17 87 113 36 41 ...
## $ State : Factor w/ 15 levels "AK","AZ","CA",..: 2 7 6 14 5 5 13 8 15 15 ...
## $ Port.Code: int 2603 3426 3803 206 118 115 2307 3312 3013 3020 ...
## $ Border : Factor w/ 2 levels "US-Canada Border",..: 2 1 1 1 1 1 2 1 1 1 ...
## $ Date : Factor w/ 290 levels "1996-01-01","1996-02-01",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Measure : Factor w/ 12 levels "Bus Passengers",..: 8 4 2 3 5 8 6 7 12 1 ...
## $ Value : int 0 7281 775 0 3879 0 0 0 183 682 ...
I don’t necessarily need to see where exactly people came through, nor do I need the ID number of the checkpoint, so I’ll get rid of those.
border_data<-border_data[-c(1:3)]
I also don’t need all the data from 1996. I’m just trying to look at recent years through today.
border_data$Date<-as.Date(border_data$Date)
border_data<-border_data%>%
filter(Date >= "2017-01-01")
view(border_data)
That’s better.
Now I want to see the total number of each type of crossing in each month.
transport<-border_data%>%
group_by(Date, Measure, Border)%>%
summarise(Value=sum(Value, na.rm = TRUE))
str(transport)
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 912 obs. of 4 variables:
## $ Date : Date, format: "2017-01-01" "2017-01-01" ...
## $ Measure: Factor w/ 12 levels "Bus Passengers",..: 1 1 2 2 3 3 4 4 5 5 ...
## $ Border : Factor w/ 2 levels "US-Canada Border",..: 1 2 1 2 1 2 1 2 1 2 ...
## $ Value : int 84754 148050 4520 14191 13537 3424417 3117590 11736798 1758783 6286179 ...
## - attr(*, "groups")=Classes 'tbl_df', 'tbl' and 'data.frame': 456 obs. of 3 variables:
## ..$ Date : Date, format: "2017-01-01" ...
## ..$ Measure: Factor w/ 12 levels "Bus Passengers",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ .rows :List of 456
## .. ..$ : int 1 2
## .. ..$ : int 3 4
## .. ..$ : int 5 6
## .. ..$ : int 7 8
## .. ..$ : int 9 10
## .. ..$ : int 11 12
## .. ..$ : int 13 14
## .. ..$ : int 15 16
## .. ..$ : int 17 18
## .. ..$ : int 19 20
## .. ..$ : int 21 22
## .. ..$ : int 23 24
## .. ..$ : int 25 26
## .. ..$ : int 27 28
## .. ..$ : int 29 30
## .. ..$ : int 31 32
## .. ..$ : int 33 34
## .. ..$ : int 35 36
## .. ..$ : int 37 38
## .. ..$ : int 39 40
## .. ..$ : int 41 42
## .. ..$ : int 43 44
## .. ..$ : int 45 46
## .. ..$ : int 47 48
## .. ..$ : int 49 50
## .. ..$ : int 51 52
## .. ..$ : int 53 54
## .. ..$ : int 55 56
## .. ..$ : int 57 58
## .. ..$ : int 59 60
## .. ..$ : int 61 62
## .. ..$ : int 63 64
## .. ..$ : int 65 66
## .. ..$ : int 67 68
## .. ..$ : int 69 70
## .. ..$ : int 71 72
## .. ..$ : int 73 74
## .. ..$ : int 75 76
## .. ..$ : int 77 78
## .. ..$ : int 79 80
## .. ..$ : int 81 82
## .. ..$ : int 83 84
## .. ..$ : int 85 86
## .. ..$ : int 87 88
## .. ..$ : int 89 90
## .. ..$ : int 91 92
## .. ..$ : int 93 94
## .. ..$ : int 95 96
## .. ..$ : int 97 98
## .. ..$ : int 99 100
## .. ..$ : int 101 102
## .. ..$ : int 103 104
## .. ..$ : int 105 106
## .. ..$ : int 107 108
## .. ..$ : int 109 110
## .. ..$ : int 111 112
## .. ..$ : int 113 114
## .. ..$ : int 115 116
## .. ..$ : int 117 118
## .. ..$ : int 119 120
## .. ..$ : int 121 122
## .. ..$ : int 123 124
## .. ..$ : int 125 126
## .. ..$ : int 127 128
## .. ..$ : int 129 130
## .. ..$ : int 131 132
## .. ..$ : int 133 134
## .. ..$ : int 135 136
## .. ..$ : int 137 138
## .. ..$ : int 139 140
## .. ..$ : int 141 142
## .. ..$ : int 143 144
## .. ..$ : int 145 146
## .. ..$ : int 147 148
## .. ..$ : int 149 150
## .. ..$ : int 151 152
## .. ..$ : int 153 154
## .. ..$ : int 155 156
## .. ..$ : int 157 158
## .. ..$ : int 159 160
## .. ..$ : int 161 162
## .. ..$ : int 163 164
## .. ..$ : int 165 166
## .. ..$ : int 167 168
## .. ..$ : int 169 170
## .. ..$ : int 171 172
## .. ..$ : int 173 174
## .. ..$ : int 175 176
## .. ..$ : int 177 178
## .. ..$ : int 179 180
## .. ..$ : int 181 182
## .. ..$ : int 183 184
## .. ..$ : int 185 186
## .. ..$ : int 187 188
## .. ..$ : int 189 190
## .. ..$ : int 191 192
## .. ..$ : int 193 194
## .. ..$ : int 195 196
## .. ..$ : int 197 198
## .. .. [list output truncated]
## ..- attr(*, ".drop")= logi TRUE
Now, is there a trend that increases or decreases among people crossing the United States’ border?
main<-ggplot(transport, aes(x = Date, y = Value, color = Measure))+
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
facet_wrap(transport$Border)
main
Looking at the two visuals, there is an obvious downward trend of personal vehicle crossings into the United States from Mexico, but it is difficult to tell just by looking at the visual if there is any trend in any other method. This is why I print the model to give us a value to our slope so we can determine if there is any significant trend.
model<-lm(Value~Measure+Border+Measure*Border, data = transport)
anova(model)
## Analysis of Variance Table
##
## Response: Value
## Df Sum Sq Mean Sq F value Pr(>F)
## Measure 11 4.9971e+15 4.5428e+14 5397.6 < 2.2e-16 ***
## Border 1 3.7631e+14 3.7631e+14 4471.2 < 2.2e-16 ***
## Measure:Border 11 1.3067e+15 1.1879e+14 1411.4 < 2.2e-16 ***
## Residuals 888 7.4737e+13 8.4164e+10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model
##
## Call:
## lm(formula = Value ~ Measure + Border + Measure * Border, data = transport)
##
## Coefficients:
## (Intercept)
## 144909
## MeasureBuses
## -138439
## MeasurePedestrians
## -105962
## MeasurePersonal Vehicle Passengers
## 4035845
## MeasurePersonal Vehicles
## 2060181
## MeasureRail Containers Empty
## -82268
## MeasureRail Containers Full
## 6307
## MeasureTrain Passengers
## -121840
## MeasureTrains
## -142991
## MeasureTruck Containers Empty
## -10052
## MeasureTruck Containers Full
## 220612
## MeasureTrucks
## 335516
## BorderUS-Mexico Border
## 17160
## MeasureBuses:BorderUS-Mexico Border
## -10143
## MeasurePedestrians:BorderUS-Mexico Border
## 3776921
## MeasurePersonal Vehicle Passengers:BorderUS-Mexico Border
## 7562188
## MeasurePersonal Vehicles:BorderUS-Mexico Border
## 4071919
## MeasureRail Containers Empty:BorderUS-Mexico Border
## -31433
## MeasureRail Containers Full:BorderUS-Mexico Border
## -127133
## MeasureTrain Passengers:BorderUS-Mexico Border
## -39397
## MeasureTrains:BorderUS-Mexico Border
## -18132
## MeasureTruck Containers Empty:BorderUS-Mexico Border
## 2237
## MeasureTruck Containers Full:BorderUS-Mexico Border
## -1290
## MeasureTrucks:BorderUS-Mexico Border
## 24925
The model uses Canada as the base. It seems that there are some more decreasing trends that we can’t see on the visual. About half of the ways of crossing the border saw a downward trend in amount of people using that way to pass through.
Of course, we are still in the early stages of this pandemic, relative to the time period this data represents, so it’s difficult to say now if COVID-19 is actually affecting the amount of people coming into the United States. But, looking closely at the visuals, there are some points that are lower than others as we enter 2020. Could this be a coincidence? Absolutely. But it could also be the start of this downward trend of people coming to the United States. Again, it is difficult to say whether or not these are really significant simply because we don’t have the data for the future, and the pandemic is still relatively new in this data set’s history.