This script loads and validates the structure of a reviewer dataset stored in an Excel file, ensuring column integrity and identifying nulls in key fields.

import pandas as pd

# Load the Excel file
file_path = '/Users/cynthiamcginnis/Downloads/Reviewers_Schema_Prototype.xlsx'
df = pd.read_excel(file_path)

# Display first few rows
print("Preview of data:")
print(df.head())

# Check column names for structure validation
expected_columns = [
    "employee_id", "first_name", "last_name", "email", "department", "role",
    "expertise_areas", "experience_level", "max_reviews_per_month", "active",
    "last_review_date", "total_reviews_completed", "current_workload",
    "performance_rating", "notes"
]

print("\nValidating column structure...")
missing_columns = [col for col in expected_columns if col not in df.columns]
extra_columns = [col for col in df.columns if col not in expected_columns]

if not missing_columns and not extra_columns:
    print(" 👍🏻 Column structure is valid.")
else:
    print(" 👎 Column validation issues detected:")
    if missing_columns:
        print(f" - Missing columns: {missing_columns}")
    if extra_columns:
        print(f" - Unexpected columns: {extra_columns}")

# Optional: check for null values in critical columns
critical_columns = ["employee_id", "email", "active"]
print("\nChecking for null values in critical fields...")
for col in critical_columns:
    null_count = df[col].isnull().sum()
    print(f" - {col}: {null_count} null values")

# Example of business logic validation
print("\nChecking for logical constraints...")
invalid_workloads = df[df["current_workload"] > df["max_reviews_per_month"]]
if not invalid_workloads.empty:
    print(f"⚠️ Found {len(invalid_workloads)} rows with workload exceeding max allowed:")
    print(invalid_workloads[["employee_id", "current_workload", "max_reviews_per_month"]])
else:
    print(" 🫶 All workload constraints satisfied.")
Preview of data:
  employee_id first_name last_name                        email    department  \
0      EMP001    Breanna      King             vcox@hotmail.com            QA   
1      EMP002      Emily    Conway  zimmermanalexis@hotmail.com  Data Science   
2      EMP003      Jacob  Sullivan            jatkins@yahoo.com   Engineering   
3      EMP004  Christina     Yates   carrierobinson@salazar.com   Engineering   
4      EMP005    Jessica   Leblanc    keith68@walker-young.info  Data Science   

               role          expertise_areas  experience_level  \
0        QA Analyst       get, sense, office                 4   
1  Senior Developer      economy, sort, last                 4   
2        QA Analyst      stay, far, strategy                 2   
3  Senior Developer          yard, why, tell                 4   
4  Junior Developer  operation, until, table                 5   

   max_reviews_per_month  active last_review_date  total_reviews_completed  \
0                      2    True       2025-02-27                       37   
1                      2   False       2025-03-24                       12   
2                      3    True       2025-02-24                       41   
3                      5   False       2025-02-24                       28   
4                      3   False       2025-04-08                       42   

   current_workload  performance_rating  \
0                 0                 2.5   
1                 1                 3.8   
2                 0                 1.6   
3                 3                 3.8   
4                 4                 3.0   

                                           notes  
0        Voice speak service themselves finally.  
1                Guess left various institution.  
2                 Statement help anything would.  
3  Throughout determine guess president manager.  
4                    Sense indicate their power.  

Validating column structure...
 👍🏻 Column structure is valid.

Checking for null values in critical fields...
 - employee_id: 0 null values
 - email: 0 null values
 - active: 0 null values

Checking for logical constraints...
⚠️ Found 6 rows with workload exceeding max allowed:
   employee_id  current_workload  max_reviews_per_month
4       EMP005                 4                      3
6       EMP007                 5                      4
10      EMP011                 3                      2
19      EMP020                 3                      2
26      EMP027                 3                      2
29      EMP030                 4                      3