MIST 5910 - Capstone Project
Description: Automatically calculate the next review date for each project based on its configured review frequency.
Justification: Eliminate the 15% oversight rate of required reviews and 30% redundancy rate.
import pandas as pd
from datetime import datetime
# Create sample project data
projects_data = {
'Project ID': ['P001', 'P002', 'P003', 'P004', 'P005'],
'Project Name': ['Website Redesign', 'Mobile App Development', 'Database Migration', 'Network Security Upgrade', 'AI Implementation'],
'Start Date': ['2024-01-15', '2023-08-22', '2024-02-10', '2023-11-05', '2023-09-18'],
'Last Review Date': ['2024-03-10', '2024-01-15', '2024-03-25', '2023-12-20', '2024-02-05'],
'Review Frequency (Years)': [0.25, 0.5, 0.25, 0.5, 1.0], # 3 months, 6 months, 3 months, 6 months, 1 year
'Department': ['Marketing', 'Development', 'IT', 'Security', 'Research']
}
# Convert to DataFrame
projects = pd.DataFrame(projects_data)
# Convert date strings to datetime objects
projects['Start Date'] = pd.to_datetime(projects['Start Date'])
projects['Last Review Date'] = pd.to_datetime(projects['Last Review Date'])
# Set today's date for demonstration (May 7, 2025)
today = pd.to_datetime('2025-05-07')
# Calculate next review date based on configured frequency
projects['Next Review Date'] = projects['Last Review Date'] + pd.to_timedelta(projects['Review Frequency (Years)'] * 365, unit='D')
projects['Days Until Review'] = (projects['Next Review Date'] - today).dt.days
# Categorize projects by review status
projects['Status'] = projects['Days Until Review'].apply(
lambda x: "Overdue" if x < 0 else ("Due Soon" if x <= 30 else "Up to Date")
)
# Display the results
print("Today's date:", today.strftime('%Y-%m-%d'))
print("\nProject Review Status:")
print(projects[['Project ID', 'Project Name', 'Last Review Date', 'Next Review Date', 'Days Until Review', 'Status']])
# Summarize the review status
status_counts = projects['Status'].value_counts()
print("\nSummary:")
for status, count in status_counts.items():
print(f"{status}: {count} projects")
Today's date: 2025-05-07
Project Review Status:
Project ID Project Name Last Review Date Next Review Date \
0 P001 Website Redesign 2024-03-10 2024-06-09 06:00:00
1 P002 Mobile App Development 2024-01-15 2024-07-15 12:00:00
2 P003 Database Migration 2024-03-25 2024-06-24 06:00:00
3 P004 Network Security Upgrade 2023-12-20 2024-06-19 12:00:00
4 P005 AI Implementation 2024-02-05 2025-02-04 00:00:00
Days Until Review Status
0 -332 Overdue
1 -296 Overdue
2 -317 Overdue
3 -322 Overdue
4 -92 Overdue
Summary:
Overdue: 5 projects
Description: Fairly distribute review assignments among qualified reviewers.
Justification: Address the “no systematic assignment mechanism” that results in workload imbalance.
import pandas as pd
# Create sample user data (reviewers)
users_data = {
'User ID': ['U001', 'U002', 'U003', 'U004', 'U005'],
'Name': ['John Smith', 'Emily Chen', 'David Lee', 'Sarah Johnson', 'Michael Wong'],
'Email': ['john.smith@company.com', 'emily.chen@company.com', 'david.lee@company.com',
'sarah.johnson@company.com', 'michael.wong@company.com'],
'Department': ['Development', 'QA', 'Security', 'Marketing', 'Research'],
'Current Load': [2, 0, 3, 1, 0] # Number of currently assigned reviews
}
# Convert to DataFrame
users = pd.DataFrame(users_data)
# Assume we need to assign reviews for these projects that need review
due_projects_data = {
'Project ID': ['P001', 'P002', 'P003'],
'Project Name': ['Website Redesign', 'Mobile App Development', 'Database Migration'],
'Department': ['Marketing', 'Development', 'IT'],
'Next Review Date': ['2025-05-20', '2025-06-15', '2025-06-01']
}
# Convert to DataFrame
due_projects = pd.DataFrame(due_projects_data)
due_projects['Next Review Date'] = pd.to_datetime(due_projects['Next Review Date'])
# Initialize an empty list for review assignments and a starting ID
review_assignments = []
review_id_start = 1001
# Print initial workload
print("Initial reviewer workload:")
print(users[['User ID', 'Name', 'Department', 'Current Load']])
print("\nProjects requiring review:")
print(due_projects[['Project ID', 'Project Name', 'Next Review Date']])
# Assign reviewers based on current workload
print("\nAssignment process:")
for _, proj in due_projects.iterrows():
# Sort users by current workload and select the one with lowest load
sorted_users = users.sort_values('Current Load')
user = sorted_users.iloc[0]
print(f"Assigning {proj['Project Name']} to {user['Name']} (current load: {user['Current Load']})")
# Create new review assignment
review_assignments.append({
"Review ID": review_id_start,
"Project ID": proj["Project ID"],
"Reviewer ID": user["User ID"],
"Scheduled Date": proj["Next Review Date"],
"Status": "Scheduled",
"Completion Date": ""
})
# Update reviewer's workload
users.loc[users["User ID"] == user["User ID"], "Current Load"] += 1
review_id_start += 1
# Convert assignments to DataFrame
assignments_df = pd.DataFrame(review_assignments)
# Print the results
print("\nFinal review assignments:")
print(assignments_df)
print("\nUpdated reviewer workload:")
print(users[['User ID', 'Name', 'Current Load']])
Initial reviewer workload:
User ID Name Department Current Load
0 U001 John Smith Development 2
1 U002 Emily Chen QA 0
2 U003 David Lee Security 3
3 U004 Sarah Johnson Marketing 1
4 U005 Michael Wong Research 0
Projects requiring review:
Project ID Project Name Next Review Date
0 P001 Website Redesign 2025-05-20
1 P002 Mobile App Development 2025-06-15
2 P003 Database Migration 2025-06-01
Assignment process:
Assigning Website Redesign to Emily Chen (current load: 0)
Assigning Mobile App Development to Michael Wong (current load: 0)
Assigning Database Migration to Emily Chen (current load: 1)
Final review assignments:
Review ID Project ID Reviewer ID Scheduled Date Status Completion Date
0 1001 P001 U002 2025-05-20 Scheduled
1 1002 P002 U005 2025-06-15 Scheduled
2 1003 P003 U002 2025-06-01 Scheduled
Updated reviewer workload:
User ID Name Current Load
0 U001 John Smith 2
1 U002 Emily Chen 2
2 U003 David Lee 3
3 U004 Sarah Johnson 1
4 U005 Michael Wong 1
Description: Automatically send email notifications to assigned reviewers about upcoming reviews.
Justification: Research shows “providing at least two weeks’ advance notice improves evaluation thoroughness by 45%” (KPMG, 2016).
import pandas as pd
from datetime import datetime
# Create sample users data
users_data = {
'User ID': ['U002', 'U004', 'U005'],
'Name': ['Emily Chen', 'Sarah Johnson', 'Michael Wong'],
'Email': ['emily.chen@company.com', 'sarah.johnson@company.com', 'michael.wong@company.com'],
'Department': ['QA', 'Marketing', 'Research'],
'Current Load': [1, 2, 1]
}
users = pd.DataFrame(users_data)
# Create sample projects data
projects_data = {
'Project ID': ['P001', 'P002', 'P003'],
'Project Name': ['Website Redesign', 'Mobile App Development', 'Database Migration'],
'Department': ['Marketing', 'Development', 'IT'],
'Start Date': ['2024-01-15', '2023-08-22', '2024-02-10']
}
projects = pd.DataFrame(projects_data)
# Create sample review assignments
reviews_data = {
'Review ID': [1001, 1002, 1003],
'Project ID': ['P001', 'P002', 'P003'],
'Reviewer ID': ['U002', 'U005', 'U004'],
'Scheduled Date': ['2025-05-20', '2025-06-15', '2025-06-01'],
'Status': ['Scheduled', 'Scheduled', 'Scheduled'],
'Completion Date': ['', '', '']
}
reviews = pd.DataFrame(reviews_data)
reviews['Scheduled Date'] = pd.to_datetime(reviews['Scheduled Date'])
# Merge data to create a complete view for notifications
merged = reviews.merge(users, left_on="Reviewer ID", right_on="User ID")
merged = merged.merge(projects[["Project ID", "Project Name"]], on="Project ID")
# Display the merged data that will be used for notifications
print("Merged data for email notifications:")
print(merged[['Review ID', 'Project ID', 'Project Name', 'Reviewer ID', 'Name', 'Email', 'Scheduled Date', 'Status']])
# Example of generating notification emails
print("\nExample Email Notifications:")
for _, row in merged.iterrows():
subject = f"Review Assignment: {row['Project Name']}"
body = f"""
Hi {row['Name']},
You have been assigned to review '{row['Project Name']}' scheduled for {row['Scheduled Date'].strftime('%Y-%m-%d')}.
This review is part of our regular quality assessment process.
Please ensure the review is completed on time.
Regards,
Project Scheduler System
"""
print("-" * 50)
print(f"To: {row['Email']}")
print(f"Subject: {subject}")
print(body)
Merged data for email notifications:
Review ID Project ID Project Name Reviewer ID Name \
0 1001 P001 Website Redesign U002 Emily Chen
1 1002 P002 Mobile App Development U005 Michael Wong
2 1003 P003 Database Migration U004 Sarah Johnson
Email Scheduled Date Status
0 emily.chen@company.com 2025-05-20 Scheduled
1 michael.wong@company.com 2025-06-15 Scheduled
2 sarah.johnson@company.com 2025-06-01 Scheduled
Example Email Notifications:
--------------------------------------------------
To: emily.chen@company.com
Subject: Review Assignment: Website Redesign
Hi Emily Chen,
You have been assigned to review 'Website Redesign' scheduled for 2025-05-20.
This review is part of our regular quality assessment process.
Please ensure the review is completed on time.
Regards,
Project Scheduler System
--------------------------------------------------
To: michael.wong@company.com
Subject: Review Assignment: Mobile App Development
Hi Michael Wong,
You have been assigned to review 'Mobile App Development' scheduled for 2025-06-15.
This review is part of our regular quality assessment process.
Please ensure the review is completed on time.
Regards,
Project Scheduler System
--------------------------------------------------
To: sarah.johnson@company.com
Subject: Review Assignment: Database Migration
Hi Sarah Johnson,
You have been assigned to review 'Database Migration' scheduled for 2025-06-01.
This review is part of our regular quality assessment process.
Please ensure the review is completed on time.
Regards,
Project Scheduler System
Description: Use three interconnected CSV files to store all necessary data.
Justification: Current Excel spreadsheets are error-prone. Structured CSV approach allows for “minimal dependencies and infrastructure requirements.”
CSV Schema Design: - Projects.csv (Project_ID, Project_Name, Start_Date, Last_Review_Date, Review_Frequency_Years, Department) - Users.csv (User_ID, Name, Email, Department, Current_Load) - Reviews.csv (Review_ID, Project_ID, Reviewer_ID, Scheduled_Date, Status, Completion_Date)
import pandas as pd
import matplotlib.pyplot as plt
# Create more comprehensive sample user data with varying workloads
users_data = {
'User ID': ['U001', 'U002', 'U003', 'U004', 'U005', 'U006', 'U007', 'U008'],
'Name': ['John Smith', 'Emily Chen', 'David Lee', 'Sarah Johnson',
'Michael Wong', 'Laura Garcia', 'Robert Kim', 'Jennifer Patel'],
'Department': ['Development', 'QA', 'Security', 'Marketing',
'Research', 'IT', 'Development', 'QA'],
'Current Load': [4, 1, 5, 2, 1, 3, 0, 2] # Number of currently assigned reviews
}
# Convert to DataFrame
users = pd.DataFrame(users_data)
# Display the workload data
print("Current reviewer workload data:")
print(users[['Name', 'Department', 'Current Load']])
# Create workload distribution chart
plt.figure(figsize=(10, 6))
user_loads = users[["Name", "Current Load"]].sort_values("Current Load", ascending=False)
plt.bar(user_loads["Name"], user_loads["Current Load"])
plt.title("Current Reviewer Workload Distribution")
plt.xlabel("Reviewer")
plt.ylabel("Number of Assigned Reviews")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
# Save the chart (would happen in actual implementation)
# plt.savefig("workload_distribution.png")
# Calculate workload statistics
total_reviews = users['Current Load'].sum()
avg_workload = users['Current Load'].mean()
max_workload = users['Current Load'].max()
min_workload = users['Current Load'].min()
print("\nWorkload Distribution Summary:")
print(f"Total assigned reviews: {total_reviews}")
print(f"Average reviews per reviewer: {avg_workload:.2f}")
print(f"Maximum workload: {max_workload} reviews")
print(f"Minimum workload: {min_workload} reviews")
# Example of detailed report text
report_text = f"""
Monthly Workload Distribution Report
Generated: May 7, 2025
Summary:
- Total reviews in progress: {total_reviews}
- Average workload per reviewer: {avg_workload:.2f} reviews
- Reviewers with the highest workload: {users.loc[users['Current Load'] == max_workload, 'Name'].values[0]} ({max_workload} reviews)
- Reviewers with lowest workload: {users.loc[users['Current Load'] == min_workload, 'Name'].values[0]} ({min_workload} reviews)
- Workload variance: {users['Current Load'].var():.2f}
Recommendations:
- Consider reassigning reviews from David Lee (5 reviews) to Robert Kim (0 reviews)
- Monitor workload balance for upcoming review assignments
- Ensure no reviewer exceeds the maximum recommended load of 5 concurrent reviews
"""
print("\nExample Report Text:")
print(report_text)
Current reviewer workload data:
Name Department Current Load
0 John Smith Development 4
1 Emily Chen QA 1
2 David Lee Security 5
3 Sarah Johnson Marketing 2
4 Michael Wong Research 1
5 Laura Garcia IT 3
6 Robert Kim Development 0
7 Jennifer Patel QA 2
Workload Distribution Summary:
Total assigned reviews: 18
Average reviews per reviewer: 2.25
Maximum workload: 5 reviews
Minimum workload: 0 reviews
Example Report Text:
Monthly Workload Distribution Report
Generated: May 7, 2025
Summary:
- Total reviews in progress: 18
- Average workload per reviewer: 2.25 reviews
- Reviewers with highest workload: David Lee (5 reviews)
- Reviewers with lowest workload: Robert Kim (0 reviews)
- Workload variance: 2.79
Recommendations:
- Consider reassigning reviews from David Lee (5 reviews) to Robert Kim (0 reviews)
- Monitor workload balance for upcoming review assignments
- Ensure no reviewer exceeds maximum recommended load of 5 concurrent reviews
png
Description: Validate all data inputs and maintain data integrity across CSV files.
Justification: Current manual process has a “documented 22% error rate.”
import pandas as pd
import numpy as np
from datetime import datetime
# Create sample project data with some validation issues
projects_data = {
'Project ID': ['P001', 'P002', 'P003', 'P004', 'P005', None],
'Project Name': ['Website Redesign', 'Mobile App Development', None, 'Network Security Upgrade', 'AI Implementation', 'Cloud Migration'],
'Start Date': ['2024-01-15', '2023-08-22', '2024-02-10', '2023-11-05', '2023-09-18', '2024-03-01'],
'Last Review Date': ['2024-03-10', '2024-01-15', '2024-03-25', None, '2024-02-05', '2024-04-01'],
'Review Frequency (Years)': [0.25, 0.5, 0.25, 0.5, -0.2, 1.0], # Note the negative value
'Department': ['Marketing', 'Development', 'IT', 'Security', 'Research', 'IT']
}
# Convert to DataFrame
projects = pd.DataFrame(projects_data)
# Try to convert date strings to datetime objects
try:
projects['Start Date'] = pd.to_datetime(projects['Start Date'])
projects['Last Review Date'] = pd.to_datetime(projects['Last Review Date'])
except Exception as e:
print(f"Date conversion error: {e}")
# Display the data with potential validation issues
print("Sample project data (with validation issues):")
print(projects)
# Define the validation function
def validate_project_data(df):
"""Validate project data for required fields and correct data types."""
errors = []
# Check for missing values in required fields
required_fields = ["Project ID", "Project Name", "Start Date", "Last Review Date", "Review Frequency (Years)"]
for field in required_fields:
if df[field].isnull().any():
missing_count = df[field].isnull().sum()
missing_indices = df[df[field].isnull()].index.tolist()
errors.append(f"Missing values in required field: {field} ({missing_count} rows, indices: {missing_indices})")
# Validate date fields
try:
pd.to_datetime(df["Start Date"])
pd.to_datetime(df["Last Review Date"])
except Exception as e:
errors.append(f"Invalid date format: {e}")
# Validate review frequency is positive
if (df["Review Frequency (Years)"] <= 0).any():
negative_indices = df[df["Review Frequency (Years)"] <= 0].index.tolist()
errors.append(f"Review frequency must be positive (found negative/zero values at indices: {negative_indices})")
# Validate logical date sequence (Start Date must be before Last Review Date)
try:
valid_dates_mask = (~df["Start Date"].isnull()) & (~df["Last Review Date"].isnull())
if (df.loc[valid_dates_mask, "Start Date"] > df.loc[valid_dates_mask, "Last Review Date"]).any():
invalid_sequence_indices = df[df["Start Date"] > df["Last Review Date"]].index.tolist()
errors.append(f"Start Date must be before Last Review Date (invalid sequence at indices: {invalid_sequence_indices})")
except Exception as e:
errors.append(f"Error checking date sequence: {e}")
return errors
# Run validation on the sample data
validation_errors = validate_project_data(projects)
# Display validation results
print("\nValidation Results:")
if validation_errors:
print(f"Found {len(validation_errors)} validation errors:")
for i, error in enumerate(validation_errors, 1):
print(f"{i}. {error}")
else:
print("No validation errors found.")
# Example of how validation could be used in practice
print("\nData Validation in Practice:")
if validation_errors:
print("Data validation failed. Please fix the following issues before proceeding:")
for error in validation_errors:
print(f"- {error}")
print("\nProcess halted due to validation errors.")
else:
print("Data validation passed. Proceeding with project review calculations...")
# Code would proceed with normal operations here
Sample project data (with validation issues):
Project ID Project Name Start Date Last Review Date \
0 P001 Website Redesign 2024-01-15 2024-03-10
1 P002 Mobile App Development 2023-08-22 2024-01-15
2 P003 None 2024-02-10 2024-03-25
3 P004 Network Security Upgrade 2023-11-05 NaT
4 P005 AI Implementation 2023-09-18 2024-02-05
5 None Cloud Migration 2024-03-01 2024-04-01
Review Frequency (Years) Department
0 0.25 Marketing
1 0.50 Development
2 0.25 IT
3 0.50 Security
4 -0.20 Research
5 1.00 IT
Validation Results:
Found 4 validation errors:
1. Missing values in required field: Project ID (1 rows, indices: [5])
2. Missing values in required field: Project Name (1 rows, indices: [2])
3. Missing values in required field: Last Review Date (1 rows, indices: [3])
4. Review frequency must be positive (found negative/zero values at indices: [4])
Data Validation in Practice:
Data validation failed. Please fix the following issues before proceeding:
- Missing values in required field: Project ID (1 rows, indices: [5])
- Missing values in required field: Project Name (1 rows, indices: [2])
- Missing values in required field: Last Review Date (1 rows, indices: [3])
- Review frequency must be positive (found negative/zero values at indices: [4])
Process halted due to validation errors.
Description: The system must include comprehensive documentation, including user guides for administrative staff, technical specifications, and inline code comments.
Justification: “Clear documentation of legacy systems is essential when transitioning to automated frameworks” (Hooda et al., 2023).
GETTING STARTED WITH PROJECT REVIEW SCHEDULER
1. OVERVIEW
The Project Review Scheduler helps you automate the tracking,
assignment, and notification of project reviews.
2. MAIN COMMANDS
- calculate_reviews: Identifies which projects need review
- assign_reviewers: Distributes reviews to available reviewers
- send_notifications: Sends email alerts to assigned reviewers
- generate_reports: Creates monthly schedule and workload reports
def generate_documentation():
""" Generate comprehensive system documentation from code and templates."""
# Generate user guide from templates
user_guide = create_user_guide()
# Extract technical documentation from code docstrings
tech_specs = extract_technical_specs()
# Generate installation guide
install_guide = create_installation_guide()
# Save documentation to appropriate locations
for filename, content in documentation_package.items():
save_documentation(filename, content)
image.png