DHS Data Management & Analysis - MySQL Portfolio Project

Overview

This project demonstrates a robust MySQL-based system for managing and analyzing Department of Homeland Security (DHS)-style immigration and border security data. It covers the full spectrum from data schema design to performance optimization, geospatial querying, and analytical reporting.

Technologies Used

  • Database: MySQL (with support for spatial extensions)
  • Tools: MySQL Workbench / DBeaver (for schema execution and querying)
  • Data Types: ENUMs, DATETIME, BIGINT, POINT (spatial data)
  • Features: Indexing, Views, Stored Procedures, Geospatial Queries

Database Schema

The database (dhs_data) contains the following tables:

  • EntryLogs – Tracks individual border crossings with timestamps and locations.
  • VisaApplications – Stores applicant data, including visa type, approval status, and submission date.
  • SecurityAlerts – Contains threat-level alerts per region and incident descriptions.
  • GeospatialData – Stores coordinates and risk levels for various border locations.
  • VisaApprovalStats – A SQL View summarizing visa approval data.
  • GetAlertsByRegion – A stored procedure for filtering alerts by region.

Key Features

  • Data Normalization: Ensures redundancy-free design with relationships across passport numbers and regions.
  • Index Optimization: Strategic indexes improve query performance on frequent filters and joins.
  • Geospatial Capability: Uses MySQL’s spatial extensions to calculate distances and perform geographic filtering.
  • Analysis Queries:
    • Visa approvals over time
    • Border crossing trends by location and year
    • Frequent travelers (anomalies)
    • High-risk alerts by region
  • Procedure Automation: Dynamic stored procedure to retrieve alerts by region, sorted by threat level.

How to Use

  1. Open MySQL Workbench or any SQL client connected to a MySQL server.
  2. Run the script dhs_data.sql in sequence to:
    • Create the database and tables
    • Populate or modify relevant data
    • Create views, indexes, and procedures
  3. Run the analysis queries at the bottom of the script for insights.

Sample Queries

  • Top 5 Closest High-Risk Borders to NYC:
  SELECT LocationName, RiskLevel, ST_AsText(LocationPoint)
  FROM GeospatialData
  ORDER BY ST_Distance_Sphere(LocationPoint, ST_GeomFromText('POINT(40.7128 -74.0060)'))
  LIMIT 5;