Purpose of This Document

This document is intended to help leadership review, shape, and approve a work-related capstone project for my Data Technology program. My goal is to complete a project that satisfies the program requirements while also creating something useful for the business.

The capstone must follow a four-step process:

  1. Find and evaluate a dataset
  2. Clean and explore the data through EDA
  3. Analyze the data to answer business questions
  4. Package the final work into a professional portfolio-style deliverable

Because of that structure, the strongest project would be one that uses a manageable internal dataset, answers a real business question, and can be shared externally only in an anonymized or redacted form if approved.

What the School Is Looking For

My capstone project is expected to demonstrate that I can:

The final project is meant to tell a clear story: define a problem, explain the data and methodology, answer business questions with analysis and visuals, and summarize recommendations or conclusions.

Capstone Framework Required by the Program

Step 1: Find a Dataset

The first part of the capstone is selecting a dataset or combination of datasets that are relevant, interesting, and manageable in size. The data should be large enough to support meaningful analysis, but not so large that it becomes difficult to clean, explore, and analyze within the course timeline.

The school recommends datasets that are ideally under 20 MB, with at least a few hundred rows and at least five columns. More detail is better, but extremely large datasets are discouraged.

For this project, I would prefer to use internal business data because it would allow the capstone to create real value for the company while also meeting the course requirements.

What Leadership Approval Would Help Define

Leadership input would help determine:

  • Which business problem would be most useful to analyze
  • Which internal datasets would be appropriate to use
  • What time period should be included
  • What confidentiality boundaries should apply
  • What can and cannot be included in any external-facing version

Step 2: Dataset Preparation and EDA

Once a dataset is selected, the next phase is cleaning and preparing the data, then exploring it through EDA.

This step includes:

  • Identifying missing or null values
  • Standardizing inconsistent values, formats, or naming conventions
  • Removing or addressing duplicate data
  • Summarizing key variables
  • Building charts and graphs to explore distributions, relationships, trends, and anomalies
  • Documenting any issues found and how they were handled

This step is important because the project will be graded not just on the final insights, but also on whether the cleaning and exploratory process is thorough, thoughtful, and well documented.

Step 3: Analysis

The analysis phase is the core of the capstone. This is where I would answer the business questions that guided the project.

Depending on the topic, this could include:

  • Trend analysis over time
  • Comparative analysis across channels, products, or customer groups
  • Correlation analysis between variables
  • Segmentation analysis
  • Optional predictive modeling if it adds value and remains manageable

This is the stage where the project needs to move beyond reporting and into interpretation. The goal is not just to describe what happened, but to explain what the findings may mean for the business.

Step 4: Putting It All Together

The final phase is turning the analysis into a polished portfolio piece. The school expects the project to be made publicly accessible through GitHub or a website and to include a structured final report.

That means the project must ultimately include:

  • A clear problem statement
  • Business questions
  • Data cleaning documentation
  • Analysis with visualizations
  • A conclusion
  • A README or project overview
  • A portfolio-friendly presentation of the work

If company data is used, the public version would need to be anonymized, redacted, or otherwise adapted so that it protects confidential information.

Possible Internal Project Options for Approval

Below are several capstone directions leadership could approve or refine.

Option 1: Marketing Channel Performance and Customer Value

Main question: Which channels and campaign types bring in the most valuable customers?

Potential data sources:

  • Order data
  • Customer data
  • Product or category data
  • Source, medium, campaign, or channel grouping fields
  • Optional ad spend data

Potential outputs:

  • Channel comparison dashboard
  • Customer value summary by source
  • Recommendations on channel mix and investment focus

Option 2: Customer Segmentation and Repeat Purchase Analysis

Main question: Which customer segments are most associated with repeat purchasing and long-term value?

Potential data sources:

  • Customer order history
  • Product categories purchased
  • First purchase date
  • Repeat order count
  • Revenue per customer

Potential outputs:

  • Segment profiles
  • Repeat purchase analysis
  • Retention opportunity summary

Option 3: Product Mix and Revenue Contribution Analysis

Main question: Which products, product categories, or starter kits drive the strongest revenue contribution and follow-on purchasing behavior?

Potential data sources:

  • Order line item data
  • Product data
  • Product category or product family data
  • Bundle or kit indicators

Potential outputs:

  • Product mix dashboard
  • Category performance report
  • Recommendations on merchandising or product focus

Option 4: Event, Lead, or Registration Performance Analysis

Main question: Which event-related efforts or lead sources appear to drive stronger registrations, attendance, or downstream value?

Potential data sources:

  • Event registrations
  • Lead source data
  • Follow-up outcomes
  • Customer or order data if applicable

Potential outputs:

  • Lead source comparison
  • Event performance summary
  • Follow-up or conversion insights

Example Data Fields That May Be Useful

If the marketing and customer value project is approved, likely useful fields would include:

Orders

  • Order ID
  • Order date
  • Customer ID
  • Order total or revenue
  • Discount used
  • Product or category purchased

Customers

  • Customer ID
  • First purchase date
  • Repeat order count
  • Total customer revenue

Marketing Attribution

  • Source
  • Medium
  • Campaign
  • Channel grouping

Products

  • Product category
  • Product family
  • Bundle or starter kit indicator

Confidentiality and Approval Considerations

Because this project may involve internal business data, I want to be careful about confidentiality and external sharing.

My intent would be:

Why This Could Be Valuable to the Company

If approved, this project could create value beyond the school requirement by producing a structured analysis that may help with:

The goal is to create something that is academically valid and also practically useful.

What I Am Asking Leadership to Help Decide

I would appreciate leadership input on the following:

  1. Which project direction would be most valuable to the business
  2. Which dataset or combination of datasets would be most appropriate to use
  3. What confidentiality boundaries should apply
  4. Whether a redacted public-facing version would be acceptable if reviewed internally first

Proposed Next Step

If leadership is open to this, the next step would be to choose one project direction and define:

Closing

I would like this capstone to be more than a school exercise. My hope is to use it as an opportunity to create a thoughtful and useful analysis that supports the business while also meeting the academic requirements of my Data Technology program.