π KEY TERMS & DEFINITIONS
Primary Key - Unique identifier for each table row
(e.g., school_id) Foreign Key - Links to
primary key of another table (e.g., district_id)
Entity - Object about which data is stored (Schools,
Districts) ERD - Visual diagram showing entities and
relationships CRUD - Create, Read, Update, Delete
operations cardinality - defines the number of
instances of one entity that can be related to instances of another
entity, typically categorized as one-to-one (1:1), one-to-many (1:N), or
many-to-many (M:N)
1NF - Atomic values, no repeating groups
2NF - 1NF + no partial dependencies
3NF - 2NF + no transitive dependencies
Denormalization - Trading normalization for
performance
INNER JOIN - Only matching records from both tables
LEFT JOIN - All left records + matching right records
Aggregate Functions - COUNT(), SUM(), AVG(), MIN(),
MAX() GROUP BY - Groups rows for aggregation
Subquery - Query inside another query
Index - Improves query performance
ETL - Extract, Transform, Load data process
ORM - Maps objects to database tables (SQLAlchemy)
API - Interface for accessing data (/api/districts)
ποΈ 1. DATABASE SCHEMA
Main Entities (8 tables)
- Locations - Geographic data (county, city,
ZIP)
- Districts - 150 MS school districts
- Schools - 1,003 individual schools
- DemographicGroups - Student subgroup
classifications
- AcademicYears - Time periods for tracking
- PerformanceRecords - 18,630 literacy performance
metrics
- TeacherQuality - Teacher experience metrics
- Books - 259 book recommendations
Entity Relationships (1:M)
Locations (1) β Districts (Many) - One county/city
can have multiple school districts - FK:
Districts.location_id β Locations.location_id
- Example: Harrison County has Biloxi, Gulfport, Long Beach
districts
Districts (1) β Schools (Many)
- One district manages multiple schools - FK:
Schools.district_id β Districts.district_id -
Example: Jackson Public has 50+ individual schools
Schools (1) β PerformanceRecords (Many) - One school
has records for multiple years/demographics - FK:
PerformanceRecords.school_id β
Schools.school_id - Example: One school has records for
each grade, year, subgroup
DemographicGroups (1) β PerformanceRecords (Many) -
One demographic category (e.g., βAll Studentsβ) applies to many records
- FK: PerformanceRecords.group_id β
DemographicGroups.group_id - Example: βAll Studentsβ
subgroup appears in thousands of records
AcademicYears (1) β PerformanceRecords (Many) - One
academic year contains many performance measurements - FK:
PerformanceRecords.year_id β
AcademicYears.year_id - Example: 2023 school year has
18,630 performance records
Districts (1) β TeacherQuality (Many) - One district
has teacher quality data for multiple years - FK:
TeacherQuality.district_id β
Districts.district_id
Primary Keys: All tables use surrogate keys
(table_id) for stability and performance
Foreign Keys List
Districts.location_id β
Locations.location_id
Schools.district_id β
Districts.district_id
PerformanceRecords.school_id β
Schools.school_id
PerformanceRecords.group_id β
DemographicGroups.group_id
PerformanceRecords.year_id β
AcademicYears.year_id
TeacherQuality.district_id β
Districts.district_id
TeacherQuality.year_id β
AcademicYears.year_id
ERD Model Overview
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββββββ
β Locations βββββ·β Districts βββββ·β Schools βββββ·β PerformanceRecordsβ
β location_id β 1:Mβ district_id β 1:Mβ school_id β 1:Mβ record_id β
β county β β district_nameβ β school_name β β english_proficiencyβ
β city β β location_id β β district_id β β school_id β
β zip_code β ββββββββββββββββ β school_type β β group_id β
βββββββββββββββ β grade_span β β year_id β
βββββββββββββββ ββββββββββββββββββββ
β² β²
β β
βββββββββββββββ ββββββββββββββββ β β
βTeacherQualityβ βDemographicGroupsβ βββββββββββ β
β quality_id β β group_id β 1:M β
β district_id βββββββββββββββββββββββββsubgroup_name β β
β year_id β βsubgroup_type β β
βββββββββββββββ ββββββββββββββββ β
β β
β ββββββββββββββββ β
βββββββββββββββββββββββββββββββββAcademicYears β βββββββββββββββββββ
β year_id β 1:M
β school_year β
ββββββββββββββββ
βββββββββββββββ
β Books β (Standalone - No FKs)
β book_id β
β title β
β author β
β grade_level β
βββββββββββββββ
π 2. NORMALIZATION IN DEPTH
Controlled Denormalization Decision
PerformanceRecords stores both
percentage AND count for each performance
level
Analysis: - Technically
denormalized: Could calculate percentage from count/total -
Justified because: - Source data provides both values -
Different analytical needs (percentage for comparison, count for totals)
- Avoids expensive calculations during queries - Values are atomic and
donβt create update anomalies
Alternative (Fully Normalized):
-- Could split into separate table
CREATE TABLE PerformanceLevelDetails (
record_id INT,
level_number INT,
percentage FLOAT,
count INT,
PRIMARY KEY (record_id, level_number)
);
Trade-off: More normalized but requires JOINs for every
query
π§ 3. SQL vs ORM EXAMPLES
Simple Query Examples
Example 1: Get All Schools in a District
Raw SQL:
SELECT school_name, school_type, grade_span
FROM Schools
WHERE district_id = 5;
SQLAlchemy ORM:
schools = Schools.query.filter_by(district_id=5).all()
for school in schools:
print(school.school_name, school.school_type)
Example 2: Get District with Location Info
Raw SQL:
SELECT d.district_name, l.county, l.city
FROM Districts d
JOIN Locations l ON d.location_id = l.location_id
WHERE d.district_id = 10;
SQLAlchemy ORM:
district = Districts.query.filter_by(district_id=10).first()
print(f"{district.district_name} in {district.location.county}")
Example 3: Count Schools per District
Raw SQL:
SELECT d.district_name, COUNT(s.school_id) as school_count
FROM Districts d
LEFT JOIN Schools s ON d.district_id = s.district_id
GROUP BY d.district_id, d.district_name
ORDER BY school_count DESC;
SQLAlchemy ORM:
from sqlalchemy import func
district_counts = db.session.query(
Districts.district_name,
func.count(Schools.school_id).label('school_count')
).outerjoin(Schools).group_by(Districts.district_id)\
.order_by(func.count(Schools.school_id).desc()).all()
SQL vs ORM Trade-offs Analysis
Raw SQL Advantages: - Performance
Control: Direct query optimization, no ORM overhead -
Full SQL Features: Window functions, CTEs, complex
subqueries - Debugging: See exact SQL executed, easier
to optimize - Database-Specific: Can use MySQL-specific
optimizations
Raw SQL Disadvantages: - Database
Coupling: SQL tied to specific database system
- Security Risk: Manual string building can lead to SQL
injection - Maintenance: Schema changes require
updating multiple SQL files - Type Safety: No
compile-time checking of column names/types
ORM (SQLAlchemy) Advantages: - Database
Agnostic: Same code works with MySQL, PostgreSQL, SQLite -
Type Safety: Python objects with IDE autocomplete and
error checking - Relationship Handling: Automatic joins
through defined relationships - Migration Support:
Schema changes tracked and versioned - Security:
Built-in protection against SQL injection
ORM Disadvantages: - Performance
Overhead: Additional layer between code and database -
Learning Curve: Must understand both Python ORM and
underlying SQL - Limited Control: Complex queries may
require dropping to raw SQL - Query Complexity: ORM
queries can become harder to read than SQL
Best Practice in MS Literacy Database:
CRUD Operations β ORM:
# Simple operations use ORM for safety and maintainability
new_school = Schools(
school_number=123,
school_name="New Elementary",
district_id=district.district_id,
school_type="Public"
)
db.session.add(new_school)
db.session.commit()
Analytics Queries β Raw SQL:
# Complex analytics use raw SQL for performance
analytics_query = """
SELECT county, AVG(english_proficiency) as avg_performance
FROM districts d
JOIN locations l ON d.location_id = l.location_id
JOIN schools s ON d.district_id = s.district_id
JOIN performance_records pr ON s.school_id = pr.school_id
WHERE pr.group_id = :group_id
GROUP BY county
ORDER BY avg_performance DESC
"""
results = db.session.execute(text(analytics_query), {'group_id': 1})
Data Integrity Constraints
-- Foreign key constraints ensure referential integrity
ALTER TABLE PerformanceRecords
ADD CONSTRAINT fk_performance_school
FOREIGN KEY (school_id) REFERENCES Schools(school_id) ON DELETE CASCADE;
-- Check constraints validate business rules
ALTER TABLE PerformanceRecords
ADD CONSTRAINT chk_proficiency_range
CHECK (english_proficiency >= 0 AND english_proficiency <= 100);
-- NOT NULL constraints ensure required data
ALTER TABLE PerformanceRecords
MODIFY COLUMN school_id INT NOT NULL;
π 7. EXAM PRACTICE QUESTIONS
Entity Relationships (15 points)
Q1: Explain why the relationship between Schools and
PerformanceRecords is 1:Many rather than 1:1. Give a specific
example.
Normalization Analysis (20 points)
Q2: Analyze this proposed table design. What normal
form violations exist?
CREATE TABLE SchoolSummary (
school_id INT,
school_name VARCHAR(255),
district_name VARCHAR(255), -- Violation?
county VARCHAR(100), -- Violation?
math_score_2022 FLOAT,
math_score_2023 FLOAT, -- Violation?
english_score_2022 FLOAT,
english_score_2023 FLOAT -- Violation?
);
Q4: The PerformanceRecords table stores both
percentage and count for each level. Is this normalized? Justify your
answer with pros/cons.
SQL vs ORM (25 points)
Q3: Convert this SQL query to SQLAlchemy ORM
syntax:
SELECT school_name, school_type
FROM Schools
WHERE district_id = 15 AND school_type = 'Elementary';
Q4: Write the SQL equivalent of this ORM query:
books = Books.query.filter_by(grade_level='3rd Grade').order_by(Books.title).all()
Good luck with your database exam! π
Focused study guide covering core database concepts using the
Mississippi Literacy Database.