Project Review Scheduler

Software Requirements & Design Document Presentation

MIST 5910 - Capstone Project

Presentation Overview

Problem Context
Software Requirements
Key Takeaways

Problem Context

The company project review process is currently managed via Excel spreadsheets
Manual process has documented a 15% oversight rate for required reviews
30% of projects undergo unnecessary multiple reviews
Administrative staff spend 15-20 hours monthly tracking review schedules
22% error rate in identifying review candidates
No systematic reviewer assignment mechanism
No centralized visibility into review pipeline

Software Requirements (1/7)

Requirement 1: Automated Review Due Date Calculator

Description: Automatically calculate the next review date for each project based on its configured review frequency.

Justification: Eliminate the 15% oversight rate of required reviews and 30% redundancy rate.

import pandas as pd
from datetime import datetime

# Create sample project data
projects_data = {
    'Project ID': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'Project Name': ['Website Redesign', 'Mobile App Development', 'Database Migration', 'Network Security Upgrade', 'AI Implementation'],
    'Start Date': ['2024-01-15', '2023-08-22', '2024-02-10', '2023-11-05', '2023-09-18'],
    'Last Review Date': ['2024-03-10', '2024-01-15', '2024-03-25', '2023-12-20', '2024-02-05'],
    'Review Frequency (Years)': [0.25, 0.5, 0.25, 0.5, 1.0],  # 3 months, 6 months, 3 months, 6 months, 1 year
    'Department': ['Marketing', 'Development', 'IT', 'Security', 'Research']
}

# Convert to DataFrame
projects = pd.DataFrame(projects_data)

# Convert date strings to datetime objects
projects['Start Date'] = pd.to_datetime(projects['Start Date'])
projects['Last Review Date'] = pd.to_datetime(projects['Last Review Date'])

# Set today's date for demonstration (May 7, 2025)
today = pd.to_datetime('2025-05-07')

# Calculate next review date based on configured frequency
projects['Next Review Date'] = projects['Last Review Date'] + pd.to_timedelta(projects['Review Frequency (Years)'] * 365, unit='D')
projects['Days Until Review'] = (projects['Next Review Date'] - today).dt.days

# Categorize projects by review status
projects['Status'] = projects['Days Until Review'].apply(
    lambda x: "Overdue" if x < 0 else ("Due Soon" if x <= 30 else "Up to Date")
)

# Display the results
print("Today's date:", today.strftime('%Y-%m-%d'))
print("\nProject Review Status:")
print(projects[['Project ID', 'Project Name', 'Last Review Date', 'Next Review Date', 'Days Until Review', 'Status']])

# Summarize the review status
status_counts = projects['Status'].value_counts()
print("\nSummary:")
for status, count in status_counts.items():
    print(f"{status}: {count} projects")

Today's date: 2025-05-07

Project Review Status:
  Project ID              Project Name Last Review Date    Next Review Date  \
0       P001          Website Redesign       2024-03-10 2024-06-09 06:00:00   
1       P002    Mobile App Development       2024-01-15 2024-07-15 12:00:00   
2       P003        Database Migration       2024-03-25 2024-06-24 06:00:00   
3       P004  Network Security Upgrade       2023-12-20 2024-06-19 12:00:00   
4       P005         AI Implementation       2024-02-05 2025-02-04 00:00:00   

   Days Until Review   Status  
0               -332  Overdue  
1               -296  Overdue  
2               -317  Overdue  
3               -322  Overdue  
4                -92  Overdue  

Summary:
Overdue: 5 projects

Software Requirements (2/7)

Requirement 2: Balanced Reviewer Assignment Algorithm

Description: Fairly distribute review assignments among qualified reviewers.

Justification: Address the “no systematic assignment mechanism” that results in workload imbalance.

import pandas as pd

# Create sample user data (reviewers)
users_data = {
    'User ID': ['U001', 'U002', 'U003', 'U004', 'U005'],
    'Name': ['John Smith', 'Emily Chen', 'David Lee', 'Sarah Johnson', 'Michael Wong'],
    'Email': ['john.smith@company.com', 'emily.chen@company.com', 'david.lee@company.com', 
              'sarah.johnson@company.com', 'michael.wong@company.com'],
    'Department': ['Development', 'QA', 'Security', 'Marketing', 'Research'],
    'Current Load': [2, 0, 3, 1, 0]  # Number of currently assigned reviews
}

# Convert to DataFrame
users = pd.DataFrame(users_data)

# Assume we need to assign reviews for these projects that need review
due_projects_data = {
    'Project ID': ['P001', 'P002', 'P003'],
    'Project Name': ['Website Redesign', 'Mobile App Development', 'Database Migration'],
    'Department': ['Marketing', 'Development', 'IT'],
    'Next Review Date': ['2025-05-20', '2025-06-15', '2025-06-01']
}

# Convert to DataFrame
due_projects = pd.DataFrame(due_projects_data)
due_projects['Next Review Date'] = pd.to_datetime(due_projects['Next Review Date'])

# Initialize an empty list for review assignments and a starting ID
review_assignments = []
review_id_start = 1001

# Print initial workload
print("Initial reviewer workload:")
print(users[['User ID', 'Name', 'Department', 'Current Load']])
print("\nProjects requiring review:")
print(due_projects[['Project ID', 'Project Name', 'Next Review Date']])

# Assign reviewers based on current workload
print("\nAssignment process:")
for _, proj in due_projects.iterrows():
    # Sort users by current workload and select the one with lowest load
    sorted_users = users.sort_values('Current Load')
    user = sorted_users.iloc[0]
    
    print(f"Assigning {proj['Project Name']} to {user['Name']} (current load: {user['Current Load']})")
    
    # Create new review assignment
    review_assignments.append({
        "Review ID": review_id_start,
        "Project ID": proj["Project ID"],
        "Reviewer ID": user["User ID"],
        "Scheduled Date": proj["Next Review Date"],
        "Status": "Scheduled",
        "Completion Date": ""
    })
    
    # Update reviewer's workload
    users.loc[users["User ID"] == user["User ID"], "Current Load"] += 1
    review_id_start += 1

# Convert assignments to DataFrame
assignments_df = pd.DataFrame(review_assignments)

# Print the results
print("\nFinal review assignments:")
print(assignments_df)

print("\nUpdated reviewer workload:")
print(users[['User ID', 'Name', 'Current Load']])

Initial reviewer workload:
  User ID           Name   Department  Current Load
0    U001     John Smith  Development             2
1    U002     Emily Chen           QA             0
2    U003      David Lee     Security             3
3    U004  Sarah Johnson    Marketing             1
4    U005   Michael Wong     Research             0

Projects requiring review:
  Project ID            Project Name Next Review Date
0       P001        Website Redesign       2025-05-20
1       P002  Mobile App Development       2025-06-15
2       P003      Database Migration       2025-06-01

Assignment process:
Assigning Website Redesign to Emily Chen (current load: 0)
Assigning Mobile App Development to Michael Wong (current load: 0)
Assigning Database Migration to Emily Chen (current load: 1)

Final review assignments:
   Review ID Project ID Reviewer ID Scheduled Date     Status Completion Date
0       1001       P001        U002     2025-05-20  Scheduled                
1       1002       P002        U005     2025-06-15  Scheduled                
2       1003       P003        U002     2025-06-01  Scheduled                

Updated reviewer workload:
  User ID           Name  Current Load
0    U001     John Smith             2
1    U002     Emily Chen             2
2    U003      David Lee             3
3    U004  Sarah Johnson             1
4    U005   Michael Wong             1

Software Requirements (3/7)

Requirement 3: Automated Email Notification System

Description: Automatically send email notifications to assigned reviewers about upcoming reviews.

Justification: Research shows “providing at least two weeks’ advance notice improves evaluation thoroughness by 45%” (KPMG, 2016).

import pandas as pd
from datetime import datetime

# Create sample users data
users_data = {
    'User ID': ['U002', 'U004', 'U005'],
    'Name': ['Emily Chen', 'Sarah Johnson', 'Michael Wong'],
    'Email': ['emily.chen@company.com', 'sarah.johnson@company.com', 'michael.wong@company.com'],
    'Department': ['QA', 'Marketing', 'Research'],
    'Current Load': [1, 2, 1]
}
users = pd.DataFrame(users_data)

# Create sample projects data
projects_data = {
    'Project ID': ['P001', 'P002', 'P003'],
    'Project Name': ['Website Redesign', 'Mobile App Development', 'Database Migration'],
    'Department': ['Marketing', 'Development', 'IT'],
    'Start Date': ['2024-01-15', '2023-08-22', '2024-02-10']
}
projects = pd.DataFrame(projects_data)

# Create sample review assignments
reviews_data = {
    'Review ID': [1001, 1002, 1003],
    'Project ID': ['P001', 'P002', 'P003'],
    'Reviewer ID': ['U002', 'U005', 'U004'],
    'Scheduled Date': ['2025-05-20', '2025-06-15', '2025-06-01'],
    'Status': ['Scheduled', 'Scheduled', 'Scheduled'],
    'Completion Date': ['', '', '']
}
reviews = pd.DataFrame(reviews_data)
reviews['Scheduled Date'] = pd.to_datetime(reviews['Scheduled Date'])

# Merge data to create a complete view for notifications
merged = reviews.merge(users, left_on="Reviewer ID", right_on="User ID")
merged = merged.merge(projects[["Project ID", "Project Name"]], on="Project ID")

# Display the merged data that will be used for notifications
print("Merged data for email notifications:")
print(merged[['Review ID', 'Project ID', 'Project Name', 'Reviewer ID', 'Name', 'Email', 'Scheduled Date', 'Status']])

# Example of generating notification emails
print("\nExample Email Notifications:")
for _, row in merged.iterrows():
    subject = f"Review Assignment: {row['Project Name']}"
    body = f"""
    Hi {row['Name']},
    
    You have been assigned to review '{row['Project Name']}' scheduled for {row['Scheduled Date'].strftime('%Y-%m-%d')}.
    This review is part of our regular quality assessment process.
    Please ensure the review is completed on time.
    
    Regards,
    Project Scheduler System
    """
    
    print("-" * 50)
    print(f"To: {row['Email']}")
    print(f"Subject: {subject}")
    print(body)

Merged data for email notifications:
   Review ID Project ID            Project Name Reviewer ID           Name  \
0       1001       P001        Website Redesign        U002     Emily Chen   
1       1002       P002  Mobile App Development        U005   Michael Wong   
2       1003       P003      Database Migration        U004  Sarah Johnson   

                       Email Scheduled Date     Status  
0     emily.chen@company.com     2025-05-20  Scheduled  
1   michael.wong@company.com     2025-06-15  Scheduled  
2  sarah.johnson@company.com     2025-06-01  Scheduled  

Example Email Notifications:
--------------------------------------------------
To: emily.chen@company.com
Subject: Review Assignment: Website Redesign

    Hi Emily Chen,
    
    You have been assigned to review 'Website Redesign' scheduled for 2025-05-20.
    This review is part of our regular quality assessment process.
    Please ensure the review is completed on time.
    
    Regards,
    Project Scheduler System
    
--------------------------------------------------
To: michael.wong@company.com
Subject: Review Assignment: Mobile App Development

    Hi Michael Wong,
    
    You have been assigned to review 'Mobile App Development' scheduled for 2025-06-15.
    This review is part of our regular quality assessment process.
    Please ensure the review is completed on time.
    
    Regards,
    Project Scheduler System
    
--------------------------------------------------
To: sarah.johnson@company.com
Subject: Review Assignment: Database Migration

    Hi Sarah Johnson,
    
    You have been assigned to review 'Database Migration' scheduled for 2025-06-01.
    This review is part of our regular quality assessment process.
    Please ensure the review is completed on time.
    
    Regards,
    Project Scheduler System

Software Requirements (4/7)

Requirement 4: CSV-Based Data Storage Structure

Description: Use three interconnected CSV files to store all necessary data.

Justification: Current Excel spreadsheets are error-prone. Structured CSV approach allows for “minimal dependencies and infrastructure requirements.”

CSV Schema Design: - Projects.csv (Project_ID, Project_Name, Start_Date, Last_Review_Date, Review_Frequency_Years, Department) - Users.csv (User_ID, Name, Email, Department, Current_Load) - Reviews.csv (Review_ID, Project_ID, Reviewer_ID, Scheduled_Date, Status, Completion_Date)

import pandas as pd
import matplotlib.pyplot as plt

# Create more comprehensive sample user data with varying workloads
users_data = {
    'User ID': ['U001', 'U002', 'U003', 'U004', 'U005', 'U006', 'U007', 'U008'],
    'Name': ['John Smith', 'Emily Chen', 'David Lee', 'Sarah Johnson', 
             'Michael Wong', 'Laura Garcia', 'Robert Kim', 'Jennifer Patel'],
    'Department': ['Development', 'QA', 'Security', 'Marketing', 
                  'Research', 'IT', 'Development', 'QA'],
    'Current Load': [4, 1, 5, 2, 1, 3, 0, 2]  # Number of currently assigned reviews
}

# Convert to DataFrame
users = pd.DataFrame(users_data)

# Display the workload data
print("Current reviewer workload data:")
print(users[['Name', 'Department', 'Current Load']])

# Create workload distribution chart
plt.figure(figsize=(10, 6))
user_loads = users[["Name", "Current Load"]].sort_values("Current Load", ascending=False)
plt.bar(user_loads["Name"], user_loads["Current Load"])
plt.title("Current Reviewer Workload Distribution")
plt.xlabel("Reviewer")
plt.ylabel("Number of Assigned Reviews")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()

# Save the chart (would happen in actual implementation)
# plt.savefig("workload_distribution.png")

# Calculate workload statistics
total_reviews = users['Current Load'].sum()
avg_workload = users['Current Load'].mean()
max_workload = users['Current Load'].max()
min_workload = users['Current Load'].min()

print("\nWorkload Distribution Summary:")
print(f"Total assigned reviews: {total_reviews}")
print(f"Average reviews per reviewer: {avg_workload:.2f}")
print(f"Maximum workload: {max_workload} reviews")
print(f"Minimum workload: {min_workload} reviews")

# Example of detailed report text
report_text = f"""
Monthly Workload Distribution Report
Generated: May 7, 2025

Summary:
- Total reviews in progress: {total_reviews}
- Average workload per reviewer: {avg_workload:.2f} reviews
- Reviewers with the highest workload: {users.loc[users['Current Load'] == max_workload, 'Name'].values[0]} ({max_workload} reviews)
- Reviewers with lowest workload: {users.loc[users['Current Load'] == min_workload, 'Name'].values[0]} ({min_workload} reviews)
- Workload variance: {users['Current Load'].var():.2f}

Recommendations:
- Consider reassigning reviews from David Lee (5 reviews) to Robert Kim (0 reviews)
- Monitor workload balance for upcoming review assignments
- Ensure no reviewer exceeds the maximum recommended load of 5 concurrent reviews
"""

print("\nExample Report Text:")
print(report_text)

Current reviewer workload data:
             Name   Department  Current Load
0      John Smith  Development             4
1      Emily Chen           QA             1
2       David Lee     Security             5
3   Sarah Johnson    Marketing             2
4    Michael Wong     Research             1
5    Laura Garcia           IT             3
6      Robert Kim  Development             0
7  Jennifer Patel           QA             2

Workload Distribution Summary:
Total assigned reviews: 18
Average reviews per reviewer: 2.25
Maximum workload: 5 reviews
Minimum workload: 0 reviews

Example Report Text:

Monthly Workload Distribution Report
Generated: May 7, 2025

Summary:
- Total reviews in progress: 18
- Average workload per reviewer: 2.25 reviews
- Reviewers with highest workload: David Lee (5 reviews)
- Reviewers with lowest workload: Robert Kim (0 reviews)
- Workload variance: 2.79

Recommendations:
- Consider reassigning reviews from David Lee (5 reviews) to Robert Kim (0 reviews)
- Monitor workload balance for upcoming review assignments
- Ensure no reviewer exceeds maximum recommended load of 5 concurrent reviews

png

Software Requirements (6/7)

Requirement 6: Error Handling and Data Validation

Description: Validate all data inputs and maintain data integrity across CSV files.

Justification: Current manual process has a “documented 22% error rate.”

import pandas as pd
import numpy as np
from datetime import datetime

# Create sample project data with some validation issues
projects_data = {
    'Project ID': ['P001', 'P002', 'P003', 'P004', 'P005', None],
    'Project Name': ['Website Redesign', 'Mobile App Development', None, 'Network Security Upgrade', 'AI Implementation', 'Cloud Migration'],
    'Start Date': ['2024-01-15', '2023-08-22', '2024-02-10', '2023-11-05', '2023-09-18', '2024-03-01'],
    'Last Review Date': ['2024-03-10', '2024-01-15', '2024-03-25', None, '2024-02-05', '2024-04-01'],
    'Review Frequency (Years)': [0.25, 0.5, 0.25, 0.5, -0.2, 1.0],  # Note the negative value
    'Department': ['Marketing', 'Development', 'IT', 'Security', 'Research', 'IT']
}

# Convert to DataFrame
projects = pd.DataFrame(projects_data)

# Try to convert date strings to datetime objects
try:
    projects['Start Date'] = pd.to_datetime(projects['Start Date'])
    projects['Last Review Date'] = pd.to_datetime(projects['Last Review Date'])
except Exception as e:
    print(f"Date conversion error: {e}")

# Display the data with potential validation issues
print("Sample project data (with validation issues):")
print(projects)

# Define the validation function
def validate_project_data(df):
    """Validate project data for required fields and correct data types."""
    errors = []
    
    # Check for missing values in required fields
    required_fields = ["Project ID", "Project Name", "Start Date", "Last Review Date", "Review Frequency (Years)"]
    for field in required_fields:
        if df[field].isnull().any():
            missing_count = df[field].isnull().sum()
            missing_indices = df[df[field].isnull()].index.tolist()
            errors.append(f"Missing values in required field: {field} ({missing_count} rows, indices: {missing_indices})")
    
    # Validate date fields
    try:
        pd.to_datetime(df["Start Date"])
        pd.to_datetime(df["Last Review Date"])
    except Exception as e:
        errors.append(f"Invalid date format: {e}")
    
    # Validate review frequency is positive
    if (df["Review Frequency (Years)"] <= 0).any():
        negative_indices = df[df["Review Frequency (Years)"] <= 0].index.tolist()
        errors.append(f"Review frequency must be positive (found negative/zero values at indices: {negative_indices})")
    
    # Validate logical date sequence (Start Date must be before Last Review Date)
    try:
        valid_dates_mask = (~df["Start Date"].isnull()) & (~df["Last Review Date"].isnull())
        if (df.loc[valid_dates_mask, "Start Date"] > df.loc[valid_dates_mask, "Last Review Date"]).any():
            invalid_sequence_indices = df[df["Start Date"] > df["Last Review Date"]].index.tolist()
            errors.append(f"Start Date must be before Last Review Date (invalid sequence at indices: {invalid_sequence_indices})")
    except Exception as e:
        errors.append(f"Error checking date sequence: {e}")
    
    return errors

# Run validation on the sample data
validation_errors = validate_project_data(projects)

# Display validation results
print("\nValidation Results:")
if validation_errors:
    print(f"Found {len(validation_errors)} validation errors:")
    for i, error in enumerate(validation_errors, 1):
        print(f"{i}. {error}")
else:
    print("No validation errors found.")

# Example of how validation could be used in practice
print("\nData Validation in Practice:")
if validation_errors:
    print("Data validation failed. Please fix the following issues before proceeding:")
    for error in validation_errors:
        print(f"- {error}")
    print("\nProcess halted due to validation errors.")
else:
    print("Data validation passed. Proceeding with project review calculations...")
    # Code would proceed with normal operations here

Sample project data (with validation issues):
  Project ID              Project Name Start Date Last Review Date  \
0       P001          Website Redesign 2024-01-15       2024-03-10   
1       P002    Mobile App Development 2023-08-22       2024-01-15   
2       P003                      None 2024-02-10       2024-03-25   
3       P004  Network Security Upgrade 2023-11-05              NaT   
4       P005         AI Implementation 2023-09-18       2024-02-05   
5       None           Cloud Migration 2024-03-01       2024-04-01   

   Review Frequency (Years)   Department  
0                      0.25    Marketing  
1                      0.50  Development  
2                      0.25           IT  
3                      0.50     Security  
4                     -0.20     Research  
5                      1.00           IT  

Validation Results:
Found 4 validation errors:
1. Missing values in required field: Project ID (1 rows, indices: [5])
2. Missing values in required field: Project Name (1 rows, indices: [2])
3. Missing values in required field: Last Review Date (1 rows, indices: [3])
4. Review frequency must be positive (found negative/zero values at indices: [4])

Data Validation in Practice:
Data validation failed. Please fix the following issues before proceeding:
- Missing values in required field: Project ID (1 rows, indices: [5])
- Missing values in required field: Project Name (1 rows, indices: [2])
- Missing values in required field: Last Review Date (1 rows, indices: [3])
- Review frequency must be positive (found negative/zero values at indices: [4])

Process halted due to validation errors.

Software Requirements (7/7)

Requirement 7: Documentation and User Guide

Description: The system must include comprehensive documentation, including user guides for administrative staff, technical specifications, and inline code comments.

Justification: “Clear documentation of legacy systems is essential when transitioning to automated frameworks” (Hooda et al., 2023).

Documentation Components:

1. User Guide (For Administrative Staff)

GETTING STARTED WITH PROJECT REVIEW SCHEDULER

1. OVERVIEW
   The Project Review Scheduler helps you automate the tracking, 
   assignment, and notification of project reviews.

2. MAIN COMMANDS
   - calculate_reviews: Identifies which projects need review
   - assign_reviewers: Distributes reviews to available reviewers
   - send_notifications: Sends email alerts to assigned reviewers
   - generate_reports: Creates monthly schedule and workload reports

2. Technical Specifications (For Developers)

def generate_documentation():
    """ Generate comprehensive system documentation from code and templates."""
    
    # Generate user guide from templates
    user_guide = create_user_guide()
    
    # Extract technical documentation from code docstrings
    tech_specs = extract_technical_specs()
    
    # Generate installation guide
    install_guide = create_installation_guide()
    
    # Save documentation to appropriate locations
    for filename, content in documentation_package.items():
        save_documentation(filename, content)

3. Documentation Generation Process

image.png

Documentation Benefits:

System Adoption: Clear user guides ensure administrative staff can effectively use the system
Knowledge Transfer: Technical specifications preserve implementation details
Future Maintainability: Well-documented code reduces maintenance costs
Training Support: User guides serve as training materials for new staff

Software Requirements - Key Takeaways

1/7: Automated Review Due Date Calculator

Key Takeaway: The automated calculator eliminates the 15% oversight rate by systematically identifying projects needing review
Automatically categorizes projects as “Overdue,” “Due Soon,” or “Up to Date”
Ensures no projects fall through the cracks, addressing core inefficiencies in the manual process

2/7: Balanced Reviewer Assignment Algorithm

Key Takeaway: Ensures fair workload distribution by prioritizing team members with fewer active assignments
Solves the problem of overburdened reviewers while others remain underutilized
Improves both efficiency and fairness in the review process

3/7: Automated Email Notification System

Key Takeaway: Structured, consistent email notifications improve communication and accountability
Provides reviewers with clear, timely information about their assignments
Research shows two weeks’ advance notice improves evaluation thoroughness by 45%

4/7: CSV-Based Data Storage Structure

Key Takeaway: Three-file CSV structure provides lightweight but robust data architecture
Replaces error-prone Excel spreadsheets with minimal dependencies
Maintains clear relationships between entities for easy implementation and maintenance

5/7: Reporting and Visualization Capabilities

Key Takeaway: Visual reports provide management with centralized visibility into the review pipeline
Enables effective monitoring of completion rates and workload distribution
Addresses leadership’s inability to identify bottlenecks or ensure regulatory compliance

6/7: Error Handling and Data Validation

Key Takeaway: Comprehensive validation reduces the current 22% error rate
Enforces data integrity rules across all inputs
Prevents cascading errors and ensures reliable operation regardless of input quality

7/7: Documentation and User Guide

Key Takeaway: Complete documentation ensures successful adoption and long-term maintainability
Provides tailored materials for administrative staff, developers, and IT teams
Supports knowledge transfer and reduces dependency on original developers