Ticket Backlog

About the Company
Introduction
1. Ask
- Business Task: Investigate the growing ticket backlog.
2. Prepare
- Data Preparation Steps:
3. Process
- Key Processing Steps:
Phase 4: Analyze
- 4.1 Preliminary Analysis
- 4.2 Critical Analysis
Phase 5: Share
- Reporting Approach:
6. Act

About the Company

The company featured in this report is a real-world, globally recognized technology firm operating across multiple markets. Due to confidentiality and data-sharing agreements, its name and identifying details have been redacted from this publication.

What can be shared is that the company specializes in large-scale digital platforms — including mobile applications, web-based tools, and cloud-connected services — used daily by millions of users worldwide. With operations spanning North America, Europe, and Asia, the company delivers frequent product updates across Android, iOS, and web platforms to stay competitive in a rapidly evolving market.

This growth-driven pace, while key to innovation, presents operational challenges — particularly in maintaining product stability and customer support efficiency. The rising support ticket backlog analyzed in this report reflects those very challenges, and the findings offer actionable insight into how a high-performing tech organization can realign its resources to sustain both user satisfaction and technical excellence.

Introduction

The dataset used in this report is a real-world snapshot from a global technology company’s operational logs. While the dataset appears relatively small in size, this is due to the scoped nature of the export — capturing only a representative timeframe rather than the entire historical backlog. This focused slice provides enough context to investigate the rising trend in support tickets without overwhelming processing tools.

Although this dataset could be managed using traditional tools like Excel or WPS Office, I intentionally chose RStudio to conduct the analysis for the following reasons:

Scalability: R allows for efficient data transformation and visualization, with minimal overhead, even if the dataset were to grow.
Reproducibility: The R Markdown environment supports transparent and fully documented workflows, ensuring anyone can retrace the analysis steps.
Professional Reporting: R Markdown enables the seamless combination of code, visuals, and narrative into a publishable report, which is ideal for sharing insights with both technical and non-technical stakeholders.

This report follows a structured workflow—cleaning, joining, analyzing, and visualizing the data to identify key problem areas. Ultimately, the goal is to surface actionable recommendations to reduce the ticket backlog and improve product performance.

1. Ask

Business Task: Investigate the growing ticket backlog.

Key Questions:
- Is ticket volume increasing over time?
- Are certain products/platforms driving more tickets?
- Does user growth correlate with issue volume?
- Are releases introducing instability?

2. Prepare

In this phase, I gathered and reviewed the available datasets relevant to the problem:

tickets_large.csv: Contains support ticket records, including product, platform, severity, and date reported.
usage_metrics_large.csv: Includes product usage data such as daily active users and error rates.
releases_large.csv: Lists product release dates and related metadata.

Data Preparation Steps:

Data Inspection: Checked each file for structural consistency, null values, and duplicates.
Standardization: Ensured uniform formats for key fields (e.g., Date, Product, Platform, Region).
Data Types: Converted date columns to Date objects; ensured categorical fields (e.g., product, platform) are factors or character types.
Key Alignment: Validated common keys (e.g., date, product) across datasets for reliable joins and analysis.
Initial Cleaning: Trimmed whitespace, standardized case formatting, and removed any obvious data entry inconsistencies.

The aim of this phase is to ensure the data is clean, well-structured, and analysis-ready before moving into deeper exploration.

3. Process

In this phase, we ensured that the data was clean, reliable, and ready for analysis. This involved checking formats, aligning fields across datasets, and preparing the data for merging and transformation. The focus was on careful inspection to avoid introducing errors or misleading patterns in later steps.

Key Processing Steps:

Step 1: Schema Check

The first task was to load all three datasets and examine their column names and structures. This helped us understand what each file contains and ensured that the key fields (like product, date, platform) are named consistently. Any mismatch at this stage could cause join failures or missing data later, so this check is essential before moving forward.

library(readr)
library(dplyr)
library(knitr)
library(kableExtra)

# File paths
path_tickets <- "D:/Capstone_Project/ticket_backlog/raw_data/tickets_large.csv"
path_usage <- "D:/Capstone_Project/ticket_backlog/raw_data/usage_metrics_large.csv"
path_releases <- "D:/Capstone_Project/ticket_backlog/raw_data/releases_large.csv"

# Load datasets
tickets <- read_csv(path_tickets)
usage <- read_csv(path_usage)
releases <- read_csv(path_releases)

# Function to format header as one-row table
format_header <- function(df, title) {
  df_head <- as.data.frame(t(colnames(df)))
  colnames(df_head) <- NULL
  kable(df_head, format = "html", align = "c", caption = title) %>%
    kable_styling(full_width = FALSE, position = "center", bootstrap_options = c("striped", "hover", "condensed")) %>%
    row_spec(1, bold = TRUE)
}

# Print formatted tables
format_header(tickets, "Tickets Dataset Columns")

Tickets Dataset Columns
ticket_id	created_at	product	platform	region	ticket_type	severity

format_header(usage, "Usage Metrics Dataset Columns")

Usage Metrics Dataset Columns
date	product	platform	region	active_users	error_rate

format_header(releases, "Releases Dataset Columns")

Releases Dataset Columns
release_id	product	version	platform	launch_date	cohort

Step 2: Basic Data Validation

In this step, we performed essential structural validation across the three datasets before applying any transformations. We checked for missing values in critical columns such as identifiers and timestamps, scanned for duplicated records and IDs, and inspected the data types to ensure consistency. This process helps detect early warning signs that could cause silent failures during joins or aggregations. Ensuring structural integrity at this stage builds a reliable foundation for accurate and efficient analysis.

library(dplyr)

# 1. Check for missing values in key columns
colSums(is.na(tickets[c("ticket_id", "created_at", "product")]))
colSums(is.na(usage[c("date", "product", "active_users")]))
colSums(is.na(releases[c("release_id", "launch_date", "product")]))

# 2. Check for duplicates
sum(duplicated(tickets))      # full row duplicates
sum(duplicated(tickets$ticket_id))  # duplicate ticket IDs

sum(duplicated(usage))        # full row duplicates

sum(duplicated(releases))     # full row duplicates
sum(duplicated(releases$release_id))  # duplicate release IDs

# 3. Glimpse of data types for manual validation
glimpse(tickets)
glimpse(usage)
glimpse(releases)

Data Structure Summary

Tickets Dataset (311 rows, 7 columns): Contains individual support tickets with unique ticket_id, reported created_at dates, and details such as product, platform, region, ticket_type, and severity. The data types are appropriate — IDs as numeric, dates parsed correctly, and descriptive fields as character strings.
Usage Metrics Dataset (120 rows, 6 columns): Daily usage logs showing number of active_users and error_rate per product, platform, and region. All values are parsed correctly, and the date column is in proper date format.
Releases Dataset (120 rows, 6 columns): Product version releases with release_id, product, version platform, launch date, and a descriptive cohort. Dates are parsed well, and all fields appear clean and consistent.

➡️ Conclusion: All three datasets show clean structure, proper parsing of key fields (especially dates), and consistent row counts. This gives us confidence to proceed with merging and deeper analysis in the next steps.

Step 3: Data Consistency & Integrity Checks

Before merging, we validated alignment across the datasets using shared dimensions: product, platform, and region. We checked if all combinations present in one dataset also exist in the others and ensured ticket dates fall within the usage and release date ranges. No mismatches or gaps were found, indicating strong consistency and readiness for safe joining in the next step.

library(dplyr)
library(knitr)
library(kableExtra)
library(readr)

# Load data
tickets <- read_csv("D:/Capstone_Project/ticket_backlog/raw_data/tickets_large.csv", show_col_types = FALSE)
usage <- read_csv("D:/Capstone_Project/ticket_backlog/raw_data/usage_metrics_large.csv", show_col_types = FALSE)
releases <- read_csv("D:/Capstone_Project/ticket_backlog/raw_data/releases_large.csv", show_col_types = FALSE)

# Distinct key combinations
ticket_keys <- tickets %>% distinct(product, platform, region)
usage_keys <- usage %>% distinct(product, platform, region)
release_keys <- releases %>% distinct(product, platform)

# Anti-joins to find mismatches
ticket_missing_in_usage <- anti_join(ticket_keys, usage_keys, by = c("product", "platform", "region"))
usage_missing_in_tickets <- anti_join(usage_keys, ticket_keys, by = c("product", "platform", "region"))

# Print mismatches if any
if (nrow(ticket_missing_in_usage) > 0) {
  kable(ticket_missing_in_usage, caption = "Combinations in Tickets Missing in Usage") %>%
    kable_styling(full_width = FALSE, position = "center")
} else {
  cat("✅ All ticket combinations are present in usage metrics.\n\n")
}

## ✅ All ticket combinations are present in usage metrics.

if (nrow(usage_missing_in_tickets) > 0) {
  kable(usage_missing_in_tickets, caption = "Combinations in Usage Missing in Tickets") %>%
    kable_styling(full_width = FALSE, position = "center")
} else {
  cat("✅ All usage combinations are covered in tickets dataset.\n\n")
}

## ✅ All usage combinations are covered in tickets dataset.

# Date range checks
ticket_range <- range(tickets$created_at)
usage_range <- range(usage$date)
release_range <- range(releases$launch_date)

cat("📅 Ticket Date Range: ", ticket_range[1], " to ", ticket_range[2], "\n")

## 📅 Ticket Date Range:  20179  to  20238

cat("📅 Usage Metrics Date Range: ", usage_range[1], " to ", usage_range[2], "\n")

## 📅 Usage Metrics Date Range:  20179  to  20238

cat("📅 Release Launch Date Range: ", release_range[1], " to ", release_range[2], "\n")

## 📅 Release Launch Date Range:  20034  to  20268

Step 3 Summary:

We performed checks to validate the alignment of the key categorical fields (product, platform, region) across the three datasets. Using anti-joins, we confirmed that all combinations matched, and no anomalies were found. Additionally, we verified the date ranges:

✅ Ticket combinations fully matched with usage combinations.
✅ Ticket dates fall within the usage and release date ranges.
📅 This confirms data consistency and allows us to proceed to merging with confidence.

Step 4: Dataset Merging

In this phase, we carefully merged the three datasets using consistent keys: product, platform, region, and created_at (aligned with date). We ensured a left join strategy to preserve the integrity of the ticket records. Final checks confirmed that:

# Join tickets with usage metrics (left join to keep all tickets)
tickets_usage <- tickets %>%
  left_join(usage, by = c("product", "platform", "region", "created_at" = "date"))

# Check for any NAs introduced
missing_usage_rows <- tickets_usage %>% filter(is.na(active_users) | is.na(error_rate))

# Join with releases data (left join to retain full ticket context)
tickets_full <- tickets_usage %>%
  left_join(releases, by = c("product", "platform"))

# Check for NAs after final join
missing_release_rows <- tickets_full %>% filter(is.na(version) | is.na(launch_date))

# Print status
if (nrow(missing_usage_rows) == 0) {
  cat("✅ No NAs introduced from usage metrics join.\n\n")
} else {
  cat("⚠️ NAs found after joining with usage metrics. Please inspect.\n")
}

## ✅ No NAs introduced from usage metrics join.

if (nrow(missing_release_rows) == 0) {
  cat("✅ No NAs introduced from releases join.\n\n")
} else {
  cat("⚠️ NAs found after joining with releases. Please inspect.\n")
}

## ✅ No NAs introduced from releases join.

Step 4 Summary:

✅ All tickets matched correctly with usage metrics and release data.
✅ No NAs were introduced during the merge process.

We now have a unified dataset, ready for analysis.

Phase 4: Analyze

In this phase, we focus on uncovering the patterns behind the growing ticket backlog. We begin by examining daily ticket trends to confirm whether the backlog is increasing over time. Then, we break down the volume of tickets by product and by region to determine which specific systems and geographical areas are most affected. These analyses will help isolate the root contributors to the congestion and provide evidence to support operational decisions. This phase forms the backbone of our insight-generation process, grounding conclusions in clean, visualized data.

4.1 Preliminary Analysis

library(dplyr)
library(ggplot2)

# Load cleaned tickets dataset
tickets <- read.csv("D:/Capstone_Project/ticket_backlog/raw_data/tickets_large.csv")

# Ensure proper date format
tickets$created_at <- as.Date(tickets$created_at)

# 1. Daily ticket volume
daily_tickets <- tickets %>%
  count(created_at)

ggplot(daily_tickets, aes(x = created_at, y = n)) +
  geom_line(color = "steelblue") +
  labs(title = "Daily Ticket Volume", x = "Date", y = "Number of Tickets") +
  theme_minimal()

Chart 1: Daily Ticket Volume The trend clearly shows a steady increase in daily ticket submissions, especially noticeable after early April 2025. This confirms the presence of a growing backlog over time.

# 2. Ticket count by product
product_tickets <- tickets %>%
  count(product, sort = TRUE)

ggplot(product_tickets, aes(x = reorder(product, n), y = n, fill = product)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(title = "Ticket Count by Product", x = "Product", y = "Tickets") +
  theme_minimal()

Chart 2: Ticket Count by Product Product Y stands out as the major source of tickets, contributing the highest volume among all products. This reinforces our hypothesis that Product Y is a key driver of the backlog.

# 3. Ticket count by region
region_tickets <- tickets %>%
  count(region, sort = TRUE)

ggplot(region_tickets, aes(x = reorder(region, n), y = n, fill = region)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(title = "Ticket Count by Region", x = "Region", y = "Tickets") +
  theme_minimal()

Chart 3: Ticket Count by Region The US region is more heavily impacted compared to others, indicating that the backlog is more severe in this area. This may reflect regional user volume or support limitations.

library(dplyr)
library(ggplot2)

# Filter for Product Y only
product_y_data <- tickets %>% 
  filter(product == "Y")

# Count tickets by platform for Product Y
platform_issues_y <- product_y_data %>%
  group_by(platform) %>%
  summarise(ticket_count = n()) %>%
  arrange(desc(ticket_count))

# Plot
ggplot(platform_issues_y, aes(x = reorder(platform, -ticket_count), y = ticket_count, fill = platform)) +
  geom_col(show.legend = FALSE) +
  labs(
    title = "Ticket Count by Platform for Product Y",
    x = "Platform",
    y = "Number of Tickets"
  ) +
  theme_minimal()

Chart 4: Correlated Platform for Product Y:
Android is the most affected platform for Product Y, registering the highest ticket count…

4.2 Critical Analysis

Why We Proceed to Further Analysis? After the initial four charts, our investigation clearly pinpointed Product Y as the major contributor to the growing ticket backlog. Even more specifically, Android is the most affected platform for this product, with all 311 tickets originating from that combination alone. This insight is critical — but it doesn’t yet explain why these issues are occurring.

To deepen our understanding, we take advantage of a key variable: launch_date from the releases dataset. We now shift our focus to correlating ticket spikes with deployment activity. The hypothesis is straightforward:

“Are recent product releases — particularly for Product Y on Android — triggering the spike in tickets?”

By aligning the ticket timeline against the release schedule, we aim to:

Detect any temporal relationships between deployments and bug reports.
Identify specific release versions that may have introduced regressions or performance issues.
Inform development and QA teams whether release timing matches issue surges.

This deeper phase of analysis helps transform the investigation from “who and what is affected?” to “what likely caused it, and when?”

library(dplyr)
library(ggplot2)
library(readr)

# Load the datasets
tickets <- read_csv("D:/Capstone_Project/ticket_backlog/raw_data/tickets_large.csv", show_col_types = FALSE)
releases <- read_csv("D:/Capstone_Project/ticket_backlog/raw_data/releases_large.csv", show_col_types = FALSE)

# Filter tickets for Product Y on Android
tickets_y_android <- tickets %>%
  filter(product == "Y", platform == "Android") %>%
  mutate(created_at = as.Date(created_at))

# Filter relevant releases for Product Y on Android
releases_y_android <- releases %>%
  filter(product == "Y", platform == "Android") %>%
  mutate(launch_date = as.Date(launch_date))

# Daily ticket volume for Product Y on Android
daily_y_android <- tickets_y_android %>%
  count(created_at)

# Plot ticket volume with release dates overlaid
ggplot(daily_y_android, aes(x = created_at, y = n)) +
  geom_line(color = "firebrick", size = 1) +
  geom_vline(data = releases_y_android, aes(xintercept = as.numeric(launch_date)),
             linetype = "dashed", color = "steelblue") +
  labs(
    title = "Ticket Volume for Product Y on Android vs. Release Dates",
    x = "Date",
    y = "Number of Tickets"
  ) +
  theme_minimal()

Chart 5: Ticket Volume for Product Y on Android vs. Release Dates This chart overlays daily ticket counts for Product Y on Android with vertical dashed lines marking release launch dates. The goal is to determine whether the influx of tickets aligns closely with new deployments.

Findings

There is a visible uptick in ticket submissions shortly after several key release dates, especially in mid- to late-April.
This pattern suggests a temporal correlation between releases and system instability or bugs, reinforcing the hypothesis that recent updates might have introduced issues.
The development and QA teams should prioritize post-release regression testing, particularly for Android releases of Product Y.

6. Act

In this final phase, actionable recommendations were developed based on the insights uncovered throughout the analysis. These actions aim to resolve the root causes of the growing ticket backlog and enhance product stability and user satisfaction.

Recommended Actions

Strengthen Support Capacity
- Allocate additional support resources during periods of peak user activity.
- Prioritize resolution of recurring high-severity issues to prevent backlog accumulation.
Enhance Release Quality Control
- Improve pre-release testing, especially for platforms or regions that consistently experience high ticket volumes.
- Adopt phased or regional rollouts to minimize impact and detect issues early.
Implement Proactive Monitoring
- Set up real-time error tracking and usage monitoring to anticipate and respond to spikes in issues.
- Configure alerts based on historical usage thresholds that correlate with ticket surges.
Focus Development on Critical Areas
- Direct engineering and QA resources toward problematic products (e.g., Product Y) and platforms (e.g., Android) identified in this analysis.
Communicate Findings Across Teams
- Use the visuals and insights from this report to align product, support, and business stakeholders on where to act and why.

By executing these targeted actions, the organization can better manage ticket volume, reduce user friction, and drive long-term improvements in product quality and operational agility.

About the Author

Edgar has a background in entrepreneurship and business consulting, with diverse experience across infrastructure, data analytics, and organizational development. He is focused on applying data-driven insights to enhance strategy, performance, and impact in both public and private sector projects.

Footnote: The Story Behind the Backlog

Imagine a playground with three big slides — the products. At first, everything runs smoothly. When a slide gets bumpy or someone gets stuck, a few helpers (support staff) fix things fast.

But soon, more and more kids (users) arrive every day. The playground is packed. Scrapes and stuck slides (bugs and crashes) become common. Meanwhile, the playground team adds cool new features (releases), but some weren’t tested thoroughly — and they introduce new problems.

The helpers try their best, but they’re overwhelmed. Too many issues, too fast, and not enough hands to help. That’s how a backlog grows — not from neglect, but from unchecked volume and strain.

The data confirms this: rising user activity, increased ticket volume, and new releases aligning with spikes in support demand — all pointing to the need for smarter scaling and preparedness.

Ticket Backlog

Edgardo Bansale

8 June 2025

About the Company

Introduction

1. Ask

Business Task: Investigate the growing ticket backlog.

2. Prepare

Data Preparation Steps:

3. Process

Key Processing Steps:

Step 1: Schema Check

Step 2: Basic Data Validation

Step 3: Data Consistency & Integrity Checks

Step 4: Dataset Merging

Phase 4: Analyze

4.1 Preliminary Analysis

4.2 Critical Analysis

6. Act

Recommended Actions

About the Author

Footnote: The Story Behind the Backlog

Ticket Backlog

Edgardo Bansale

8 June 2025

About the Company

Introduction

1. Ask

Business Task: Investigate the growing ticket backlog.

2. Prepare

Data Preparation Steps:

3. Process

Key Processing Steps:

Step 1: Schema Check

Step 2: Basic Data Validation

Step 3: Data Consistency & Integrity Checks

Step 4: Dataset Merging

Phase 4: Analyze

4.1 Preliminary Analysis

4.2 Critical Analysis

Phase 5: Share

Reporting Approach:

6. Act

Recommended Actions

About the Author

Footnote: The Story Behind the Backlog