This project showcases a complete database-driven solution for managing and analyzing immigration and border-related data using MySQL. It simulates real-world operations of a Department of Homeland Security (DHS)-like system and includes structured data design, spatial data integration, and analytical queries. The project also integrates data visualization using R for better insight communication.
ggplot2).The MySQL database (dhs_data) includes the following key
tables:
| Table Name | Purpose |
|---|---|
EntryLogs |
Tracks border crossing activities, including time, location, and passport details. |
VisaApplications |
Records visa applications, along with approval status and dates. |
SecurityAlerts |
Stores threat-level alerts tied to specific regions. |
GeospatialData |
Holds coordinates and risk levels for border locations using spatial types. |
Additional structures include:
VisaApprovalStats) for grouped approval
statistics.GetAlertsByRegion) for quick
filtering of high-threat regions.To support efficient querying and filtering, indexes were created on:
PassportNumber,
TimeStamp)Country,
ThreatLevel, Region)SPATIAL INDEX for
geospatial operations.Visa Approval Trend (2024–2025)
SELECT
YEAR(SubmissionDate) AS Year,
COUNT(*) AS `Total Applications`,
SUM(CASE WHEN ApprovalStatus = 'Approved' THEN 1 ELSE 0 END) AS `Approved Count`,
(SUM(CASE WHEN ApprovalStatus = 'Approved' THEN 1 ELSE 0 END) / COUNT(*) * 100) AS `Approval Percentage`
FROM VisaApplications
GROUP BY YEAR(SubmissionDate)
ORDER BY Year;
library(knitr)
kable(head(visa_data))
| Year | Total_Applications | Approved_Count | Approval_Percentage |
|---|---|---|---|
| 2024 | 276 | 97 | 35.1449 |
| 2025 | 24 | 9 | 37.5000 |
R code to generate the bar chart
library(ggplot2)
Plot 1: Total Applications vs Approved
Count
Plot 2: Approval Percentage Line Plot
Query: Top 5 border locations closest to New York City (POINT(40.7128 -74.0060)):
SELECT
BorderID,
LocationName,
RiskLevel,
ST_AsText(LocationPoint),
ST_Distance_Sphere(LocationPoint, ST_GeomFromText('POINT(40.7128 -74.0060)')) AS Distance
FROM GeospatialData
ORDER BY Distance
LIMIT 5;
This highlights high-risk or sensitive locations near strategic U.S. points.
Procedure Call:
CALL GetAlertsByRegion('Midwest');
Returns region-specific alerts with descending threat levels, aiding intelligence response.
Frequent Travelers (Crossed Borders >10 Times in 2024):
SELECT PassportNumber, COUNT(*)
FROM EntryLogs
WHERE TimeStamp BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY PassportNumber
HAVING COUNT(*) > 10;
Security Breaches: High-Threat Alerts Tied to EntryLogs:SELECT EL.PassportNumber, EL.TimeStamp, SA.ThreatLevel
FROM EntryLogs EL
JOIN SecurityAlerts SA ON EL.Location = SA.Region
WHERE SA.ThreatLevel > 8;
This project successfully integrates a full-stack SQL
backend capable of handling complex immigration datasets,
including geospatial analytics and
performance-tuned queries. By leveraging
MySQL and R together, it provides not only
clean data organization but also actionable insights for
decision-making.
visuals/visaapprovaltrends_plot.RSQL