Immigration Data Management and Analytical Insights using MySQL and R

Project Report
  • Author: Moore Isaiah
  • Date: 2022-05-18

Introduction

This project showcases a complete database-driven solution for managing and analyzing immigration and border-related data using MySQL. It simulates real-world operations of a Department of Homeland Security (DHS)-like system and includes structured data design, spatial data integration, and analytical queries. The project also integrates data visualization using R for better insight communication.


Objectives

  • To design and implement a normalized, query-optimized relational schema.
  • To integrate geospatial data using MySQL’s spatial capabilities.
  • To perform analytical queries for trends, patterns, and anomalies in immigration data.
  • To visualize key metrics using R (ggplot2).

Database Schema Overview

The MySQL database (dhs_data) includes the following key tables:

Table Name Purpose
EntryLogs Tracks border crossing activities, including time, location, and passport details.
VisaApplications Records visa applications, along with approval status and dates.
SecurityAlerts Stores threat-level alerts tied to specific regions.
GeospatialData Holds coordinates and risk levels for border locations using spatial types.

Additional structures include:

  • A SQL View (VisaApprovalStats) for grouped approval statistics.
  • A Stored Procedure (GetAlertsByRegion) for quick filtering of high-threat regions.

Indexing & Optimization

To support efficient querying and filtering, indexes were created on:

  • High-cardinality fields (e.g., PassportNumber, TimeStamp)
  • Categorical filters (e.g., Country, ThreatLevel, Region)
  • Spatial columns using MySQL’s SPATIAL INDEX for geospatial operations.

Analytical SQL Queries

Visa Approval Trend (2024–2025)

SELECT
    YEAR(SubmissionDate) AS Year,
    COUNT(*) AS `Total Applications`,
    SUM(CASE WHEN ApprovalStatus = 'Approved' THEN 1 ELSE 0 END) AS `Approved Count`,
    (SUM(CASE WHEN ApprovalStatus = 'Approved' THEN 1 ELSE 0 END) / COUNT(*) * 100) AS `Approval Percentage`
FROM VisaApplications
GROUP BY YEAR(SubmissionDate)
ORDER BY Year;
library(knitr)
kable(head(visa_data))
Year Total_Applications Approved_Count Approval_Percentage
2024 276 97 35.1449
2025 24 9 37.5000

R Visuals

R code to generate the bar chart

library(ggplot2)

Plot 1: Total Applications vs Approved Count

Plot 2: Approval Percentage Line Plot

Geospatial Insights

Query: Top 5 border locations closest to New York City (POINT(40.7128 -74.0060)):

SELECT
    BorderID,
    LocationName,
    RiskLevel,
    ST_AsText(LocationPoint),
    ST_Distance_Sphere(LocationPoint, ST_GeomFromText('POINT(40.7128 -74.0060)')) AS Distance
FROM GeospatialData
ORDER BY Distance
LIMIT 5;

This highlights high-risk or sensitive locations near strategic U.S. points.

Security Alert Detection

Procedure Call:

CALL GetAlertsByRegion('Midwest');

Returns region-specific alerts with descending threat levels, aiding intelligence response.

Unusual Travel Pattern Detection

Frequent Travelers (Crossed Borders >10 Times in 2024):

SELECT PassportNumber, COUNT(*)
FROM EntryLogs
WHERE TimeStamp BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY PassportNumber
HAVING COUNT(*) > 10;
Security Breaches: High-Threat Alerts Tied to EntryLogs:SELECT EL.PassportNumber, EL.TimeStamp, SA.ThreatLevel
FROM EntryLogs EL
JOIN SecurityAlerts SA ON EL.Location = SA.Region
WHERE SA.ThreatLevel > 8;

Conclusion

This project successfully integrates a full-stack SQL backend capable of handling complex immigration datasets, including geospatial analytics and performance-tuned queries. By leveraging MySQL and R together, it provides not only clean data organization but also actionable insights for decision-making.

Recommendations

  • Future EnhancementsWeb dashboard with Flask/Streamlit to display live query results.
  • Integration with a map API (Leaflet.js or Google Maps) for geospatial data display.
  • Real data simulation for scalability testing.

Appendix

  • R Script: Included in visuals/visaapprovaltrends_plot.R
  • SQL