Data

This report provides statistics for the number of illegal immigrants arrested or apprehended by the border patrol in each division (or sector) of the United States borders with Canada, Mexico, and Caribbean islands; this data is a partial measure of the flow of people illegally entering the United States.

Loading libraries

library(dplyr)
library(tidyr)
require(d3Network)
library(igraph)
library(ggplot2)
require(networkD3)

Data Cleanse

The data is very unformatted and needs to be cleansed before any anlaysis can be done

arrests <- read.csv("arrests.csv", stringsAsFactors = FALSE)
names(arrests) <- gsub("[.]", " ", names(arrests))
arrests.clns <-
arrests %>% gather(key, value, -Border, -Sector, -`State Territory`) %>% separate(key,
                                    into = c("Year", "Type"),
                                    sep = " ",
                                    extra = "merge")%>%na.omit()

arrests.clns$Year <- gsub("X", "", arrests.clns$Year)
arrests.clns$Type<-trimws(tolower(arrests.clns$Type))

Data Manipulation for networks

Manipulating the data for assigning the node and link values. Taking average of the total arrests over the years and visually representing state with the most porous border

arrests.net <-
arrests.clns %>% filter(Border != "United States" &
Sector != "All") %>% select(Type, Sector, value) %>% rename(source =
        Type, target = Sector) %>% group_by(source, target) %>%
summarize(value = mean(value)) %>% na.omit() %>% mutate(rank = rank(desc(value), source)) %>%
arrange(rank)%>%filter(rank<=10)

Nodes <-
rbind(data.frame(name = unique(arrests.net$source)), data.frame(name = unique(arrests.net$target)))
Links <- arrests.net
Links$source <- match(Links$source, Nodes$name) - 1
Links$target <- match(Links$target, Nodes$name) - 1
require(d3Network)
sankeyNetwork(
  Links = Links,
  Nodes = Nodes,
  Source = "source",
  Target = "target",
  Value = "value",
  NodeID = "name",
  nodeWidth = 50,
  fontSize = 14
) #fontsize = 12,

Onservation Tuscon, along the mexican border, seems to be the city with biggest arrests. This is not a direct corelation between the total influx of immigrants. There are other variables such as tighter patrolling that attributes to the total number of arrests (however, it is vert likely that Tuscon will also have the most influx of immigrants)

arrests.plot<-arrests.clns%>%group_by(Year,Type)%>%summarize(avg.value=mean(value))


ggplot(arrests.plot)+aes(Year,avg.value,color=Type)+geom_point(size=2)+theme(plot.title = element_text(hjust = 0.5, face = "bold"),axis.text.y=element_blank(),axis.text.x=element_blank())+labs(title="Average arrests over the years",y="Average Arrests")

Observation

The total arrests over the years have considerbly reduced( despite more vigilant patrolling and tighter borders). Interestigly, the total number of immigrants arrested vs the number of mexicans arrested has widened indicating the number of illegal mexicans crossing the border has reduced.