Analysis of U.S. Northern versus Southern Border Truck Crossings (2020-2024)

Introduction

Research Question: Is there a statistically significant difference in the number of truck border crossings between the northern and southern U.S. borders from 2020 to 2024? This question asks whether the average monthly truck crossings at the U.S.-Canada (northern) border differ from those at the U.S.-Mexico (southern) border over a 5 year span (2020-2024).

To investigate this, we use the Border Crossing Entry Data dataset from the U.S. Bureau of Transportation Statistics (BTS). This dataset showcases monthly counts of border crossings at U.S. land ports of entry on both the Canadian and Mexican borders. Each row of data represents the number of crossings for a specific port, month, and type of crossing. The entire dataset has over 400,000 observations and 10 variables, which include:

  • Port name
  • Port code (unique alphanumeric identifier assigned to a specific location)
  • State
  • Border (US-Canada or US-Mexico)
  • Month
  • Measure (type of transport inbound to the U.S.)
  • Value (the count of trucks at a specific port in a specific month; we sum Value across all ports for each border and month to obtain total_trucks, which we then compare)
  • Latitude
  • Longitude
  • Point (Direct coordinate, which combines Latitude and Longitude)

For our analysis, we will focus only on border, date, measure, and value

https://catalog.data.gov/dataset/border-crossing-entry-data-683ae

Data Analysis

In this section, we will clean and explore the data to prepare it for our statistical test. We will first load the dataset and inspect its structure. Then, we will filter the data to include only relevant years (2020-2024) and relevant measures (truck transport only). We will then use the dplyr package to aggregate and calculate the total number of truck crossings per month, per border. Finally, we will visualize these distributions of monthly truck crossings using boxplots and a bar chart to check for noticeable differences.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
border_data <- read.csv("Border_Crossing_Entry_Data.csv")


dim(border_data)
## [1] 404543     10
head(border_data)
##      Port.Name     State Port.Code           Border     Date
## 1      Jackman     Maine       104 US-Canada Border Jan 2024
## 2     Porthill     Idaho      3308 US-Canada Border Apr 2024
## 3     San Luis   Arizona      2608 US-Mexico Border Apr 2024
## 4 Willow Creek   Montana      3325 US-Canada Border Jan 2024
## 5      Warroad Minnesota      3423 US-Canada Border Jan 2024
## 6     Whitlash   Montana      3321 US-Canada Border Jan 2024
##                       Measure Value Latitude Longitude
## 1                      Trucks  6556   45.806   -70.397
## 2                      Trucks    98   49.000  -116.499
## 3                       Buses    10   32.485  -114.782
## 4                 Pedestrians     2   49.000  -109.731
## 5 Personal Vehicle Passengers  9266   48.999   -95.377
## 6           Personal Vehicles    29   48.997  -111.258
##                           Point
## 1  POINT (-70.396722 45.805661)
## 2  POINT (-116.49925 48.999861)
## 3   POINT (-114.7822222 32.485)
## 4 POINT (-109.731333 48.999972)
## 5     POINT (-95.376555 48.999)
## 6  POINT (-111.257916 48.99725)

We will now filter and clean the data for our specific analysis, filtering necessary columns

# Create a year column by taking the last 4 characters from the Date string column
border_data$Year <- as.numeric(substr(border_data$Date, 5, 8))

trucks_data <- border_data |>
  filter(Year >= 2020 & Year <= 2024, Measure == "Trucks")

dim(trucks_data)
## [1] 5776   11
head(trucks_data)
##        Port.Name        State Port.Code           Border     Date Measure Value
## 1        Jackman        Maine       104 US-Canada Border Jan 2024  Trucks  6556
## 2       Porthill        Idaho      3308 US-Canada Border Apr 2024  Trucks    98
## 3        Warroad    Minnesota      3423 US-Canada Border Jan 2024  Trucks   837
## 4      Wildhorse      Montana      3323 US-Canada Border Jan 2024  Trucks    20
## 5 Fort Fairfield        Maine       107 US-Canada Border Jan 2024  Trucks   525
## 6        Fortuna North Dakota      3417 US-Canada Border Jan 2024  Trucks   228
##   Latitude Longitude                         Point Year
## 1   45.806   -70.397  POINT (-70.396722 45.805661) 2024
## 2   49.000  -116.499  POINT (-116.49925 48.999861) 2024
## 3   48.999   -95.377     POINT (-95.376555 48.999) 2024
## 4   48.999  -110.215 POINT (-110.215083 48.999361) 2024
## 5   46.765   -67.789  POINT (-67.789471 46.765323) 2024
## 6   49.000  -103.809  POINT (-103.80925 48.999555) 2024

After filtering our data, we have a subset, trucks_data, containing only truck crossing records from 2020-2024. We will now aggregate the data by month for each border.

monthly_trucks <- trucks_data |>
  group_by(Border, Date) |>
  summarise(total_trucks = sum(Value))
## `summarise()` has grouped output by 'Border'. You can override using the
## `.groups` argument.
monthly_trucks
## # A tibble: 120 × 3
## # Groups:   Border [2]
##    Border           Date     total_trucks
##    <chr>            <chr>           <int>
##  1 US-Canada Border Apr 2020       316002
##  2 US-Canada Border Apr 2021       456864
##  3 US-Canada Border Apr 2022       459328
##  4 US-Canada Border Apr 2023       433683
##  5 US-Canada Border Apr 2024       483095
##  6 US-Canada Border Aug 2020       465088
##  7 US-Canada Border Aug 2021       482579
##  8 US-Canada Border Aug 2022       494726
##  9 US-Canada Border Aug 2023       457659
## 10 US-Canada Border Aug 2024       466822
## # ℹ 110 more rows

We will now check the summary statistics for each border to ensure our data looks as intended (expecting 60 months for each of the two borders, spanning from Jan 2020 to Dec 2024).

# Calculate summary statistics of monthly truck counts by border
monthly_summary <- monthly_trucks |>
  group_by(Border) |>
  summarise(
    mean_trucks = mean(total_trucks),
    median_trucks = median(total_trucks),
    sd_trucks = sd(total_trucks),
    min_trucks = min(total_trucks),
    max_trucks = max(total_trucks)
  )
monthly_summary
## # A tibble: 2 × 6
##   Border           mean_trucks median_trucks sd_trucks min_trucks max_trucks
##   <chr>                  <dbl>         <dbl>     <dbl>      <int>      <int>
## 1 US-Canada Border     455899.       461550.    35235.     316002     508440
## 2 US-Mexico Border     592466.       596331     53019.     402091     677862

US-Canada Border: mean = 455899.2, median = 461549.5, standard deviation = 35235.49, min = 316002, max = 508440

US-Mexico Border: mean = 592465.8, median = 596331.0, standard deviation = 53018.64, min = 402091, max = 677862

We will now visualize the distributions to see the difference in truck means more clearly:

Boxplot:

library(ggplot2)

# Boxplot of monthly truck totals by border
ggplot(monthly_trucks, aes(x = Border, y = total_trucks, fill = Border)) +
  geom_boxplot() +
  labs(title = "Distribution of Monthly Truck Crossings (2020-2024)",
       x = "Border Region", y = "Monthly Truck Crossings") +
  theme_minimal() +
  guides(fill = "none")

The boxplot above shows monthly truck crossings for the northern border (left) and southern border (right). The southern border clearly shows a higher box and whisker position on the y-axis, indicating a greater mean, median, and IQR. This visual evidence suggests that truck traffic is consistently higher at the southern border. The variability also appears to be greater for the southern border, suggesting a higher month-to-month fluctuation of U.S.-inbound trucks compared to the north.

Bar Chart:

library(ggplot2)

# Calculate mean monthly trucks by each border
border_means <- monthly_trucks |>
  group_by(Border) |>
  summarise(mean_trucks = mean(total_trucks))

ggplot(border_means, aes(x = Border, y = mean_trucks, fill = Border)) +
  geom_col(color = "black") +
  labs(title = "Average Monthly Truck Crossings (2020–2024)",
       x = "Border Region",
       y = "Average Monthly Trucks") +
  theme_minimal() +
  guides(fill = "none")

The bar chart above showcases a clear distinction in average monthly inbound truck traffic, with southern borders having about 137,000 more average monthly trucks. This simple plot highlights the large gap in monthly truck traffic between the two regions.

Statistical Analysis

Although the visual data may look convincing enough, we need to perform an independent samples t-test to compare the mean monthly truck crossings between the northern and southern borders. This will let us know if our results are significant enough to make a claim about the data.

Hypothesis:

We will define the population mean monthly truck crossings for the northern border as \(\mu_N\) and for the southern border as \(\mu_S\). The hypotheses are:

  • \(H_0\): \(\mu_N\) = \(\mu_S\)
  • \(H_a\): \(\mu_N\) \(\neq\) \(\mu_S\)

We will assume a significance level of \(\alpha\) = 0.05.

We will now conduct the t-test:

t_test_result <- t.test(total_trucks ~ Border, data = monthly_trucks)
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  total_trucks by Border
## t = -16.617, df = 102.61, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group US-Canada Border and group US-Mexico Border is not equal to 0
## 95 percent confidence interval:
##  -152866.6 -120266.6
## sample estimates:
## mean in group US-Canada Border mean in group US-Mexico Border 
##                       455899.2                       592465.8

With a resulting p-value of less than 2.2e-16, and a threshold of \(\alpha\) = 0.05, we have strong evidence to reject the null hypothesis and support the alternative that there is a difference in mean inbound U.S. truck traffic between the northern and southern borders.

Conclusion and Future Implications

In this analysis, we found a clear and statistically significant difference in truck traffic between the U.S. northern and southern borders from 2020 to 2024. The southern border saw on average about 136,000 more truck crossings monthly than the northern border, which was a highly significant difference (p < 0.001). The visualization of the data also supported this conclusion, as it showed that the entire distribution of monthly truck counts at the southern border was greater than that of the northern border.

Trucks are clear indicators of trade and supply chain movement, meaning that these findings likely reflect higher trade volume and economic activity through the U.S.-Mexico border compared to the U.S.-Canada during this recent period. These results are important for transportation planning and resource allocation; knowing that the southern border has heavier truck traffic can inform infrastructure investment (such as expanding port staffing and facilities) and increased security measures. The strength of the evidence also emphasizes differing trade patterns between northern and southern borders, possibly due to factors like population distribution, industrial supply chain, and differing foreign policies.

This study could be extended in several ways. For example, we could examine how the truck crossing volumes changed year by year, notably around 2020 (COVID-19). A time series could reveal how quickly truck traffic changed and rebounded, showcasing a possible difference in trade recovery rates between the two borders. Another study could include seasonal patterns, marking specific ranges of months to check for spikes in truck crossings. Overall, this dataset is very versatile and informative, and there are many ways it could be modified to assess further claims.

References

Dataset: U.S. Bureau of Transportation Statistics. (n.d.). Border crossing/entry data. U.S. Department of Transportation. Retrieved from https://www.bts.gov/content/border-crossingentry-data - (direct link: https://catalog.data.gov/dataset/border-crossing-entry-data-683ae)