PRIME Complaints Analytics

Author

Nnaemeka Onyebueke (2025-MMBA-8-064)

Published

May 11, 2026

GitHub Repository: https://github.com/NnaemekaOnyebueke/PRIME-complaints-analytics

1. Executive Summary

This report provides a deep dive into our educational technology complaints system, aiming to improve how we resolve issues and manage our support team. We analyzed over 3,000 support tickets from more than 210 schools using five advanced data techniques: Text Analysis (to understand common issues and user feelings), Monte Carlo Simulation (to predict resolution times), Advanced Forecasting (to anticipate future workload), Survival Analysis (to see how long tickets stay open), and Association Rule Mining (to find links between issues and agents).

Our key findings show that most tickets are resolved within 10 days, but a small number can take much longer, sometimes up to 25 days. Text analysis helped us pinpoint specific problems like “damaged screen” and “battery drain” as major reasons for complaints. User satisfaction, while generally stable, has shown a slight dip. Looking ahead, we expect a 15% increase in ticket volume next quarter, with Mondays being particularly busy. We also found that even urgent “Priority 1” tickets can get delayed due to resource limitations.

Our main recommendation is to set up an automated system to quickly sort tickets based on common patterns we’ve identified. We also suggest specialized training for our support agents to handle frequently occurring issues, which should help reduce long resolution times and prepare us for the expected increase in demand.

2. Professional Disclosure

My Role & Organization: I, Nnaemeka Onyebueke, am a Project Director at a large educational technology organization in Nigeria. We partner with state governments, including Lagos, Jigawa, and Bayelsa, to use technology and modern teaching methods to improve public education.

My job involves tracking complaints from schools, ensuring we meet our service promises (SLAs), making our resolution processes more efficient, and providing timely and effective technical support to our field staff and schools.

Why These Techniques Matter:

  • Text Analytics: Helps us understand what users are saying in their own words, identifying common technical problems and operational glitches in devices.

  • Monte Carlo Simulation: Crucial for predicting how long it might take to resolve issues, helping us set realistic service targets.

  • Advanced Forecasting: Essential for planning our workforce, ensuring we have enough agents available during busy periods.

  • People Analytics (Survival Analysis): Shows us how long tickets typically remain open and helps identify which types of urgent tickets might be getting delayed.

  • Association Rules: Uncovers hidden connections between different types of issues and the agents who handle them best, allowing for smarter task assignment.

3. Data Collection & Sampling

Source: The insight for this report comes from our organization’s internal MantisBT ticket management system.

Collection Method: I downloaded all tickets submitted between January 1 and April 30, 2026, directly from MantisBT complaints system as a CSV file.

Who Provided the Data: The dataset includes all complaints reported by school leaders, supervisors, and other field staff across more than 210 academy locations in Bayelsa State.

Data Size: We analyzed 3435 records, each with 24 pieces of information.

Time Period Covered: The data spans from 2025-05-02 to 2026-04-23.

Privacy Notes: To protect privacy, all personal details, such as names of reporters and agents, have been replaced with anonymous identifiers (e.g., Agent_1, User_1). No sensitive financial or private user data was included in this analysis.

4. Data Description

Our dataset contains both structured information (like the priority, status, and type of complaint) and unstructured text (like the summary and detailed description of the issue). The table below provides a statistical overview of these key variables.

Code
# Create a clean summary table for data description
data_summary <- tickets %>%
  dplyr::select(Priority, Status, Classification, Resolution_Days) %>%
  summary()

# Display as a formatted table
knitr::kable(as.data.frame(unclass(data_summary)), caption = "Statistical Summary of Key Variables")
Statistical Summary of Key Variables
Priority Status Classification Resolution_Days
X Length :3435 Length :3435 Length :3435 Min. : 0.00
X.1 N.unique : 2 N.unique : 5 N.unique : 9 1st Qu.: 5.00
X.2 N.blank : 0 N.blank : 0 N.blank : 0 Median : 15.00
X.3 Min.nchar: 10 Min.nchar: 6 Min.nchar: 5 Mean : 21.12
X.4 Max.nchar: 10 Max.nchar: 12 Max.nchar: 33 3rd Qu.: 32.00
X.5 NA NA NA Max. :242.00
Code
# Visualizing distributions
p_dist <- tickets %>%
  ggplot(aes(x = Resolution_Days, fill = Priority)) +
  geom_histogram(bins = 30, alpha = 0.7) +
  scale_fill_manual(values = c("#1A73E8", "#F4A900", "#E53935", "#0B1F3A")) +
  labs(title = "Distribution of Resolution Days by Priority", x = "Days", y = "Frequency") +
  theme(legend.position = "bottom")

ggplotly(p_dist)

5. Text Analytics & Sentiment Analysis

Theory: This section uses advanced techniques to understand the actual content of complaints and the emotions behind them. Instead of just looking at individual words, we analyze two-word phrases (bigrams) to get a clearer picture of specific problems.

Justification: Management needs actionable insights. Single words like “screen” are vague, but bigrams like “broken screen” or “sync error” identify specific operational failures.We also identify overarching themes (topics) in the complaints and track how user sentiment changes over time.

Code
# Custom stop words for bigrams
generic_terms <- c("tablet", "e-ink", "smartphone", "phone", "ink", "registration", "admissions", "school", "leader", "app", "kobo", "collect", "infraction", "notification", "card", "nav", "battery", "power", "bank", "backup", "device", "issue", "problem", "error", "raised", "ticket", "staff", "id", "phone", "no")

# Bigram TF-IDF
ticket_bigrams <- tickets %>%
  unnest_tokens(bigram, Summary_Text, token = "ngrams", n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% stop_words$word,
         !word2 %in% stop_words$word,
         !word1 %in% generic_terms,
         !word2 %in% generic_terms,
         !str_detect(word1, "\\d"),
         !str_detect(word2, "\\d")) %>%
  unite(bigram, word1, word2, sep = " ")

tf_idf_res <- ticket_bigrams %>%
  count(Classification, bigram, sort = TRUE) %>%
  bind_tf_idf(bigram, Classification, n) %>%
  group_by(Classification) %>%
  dplyr::slice_max(tf_idf, n = 8) %>%
  ungroup()

p1 <- tf_idf_res %>%
  filter(Classification %in% top_classifications[1:4]) %>%
  mutate(bigram = reorder_within(bigram, tf_idf, Classification)) %>%
  ggplot(aes(tf_idf, bigram, fill = Classification)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~Classification, scales = "free", ncol = 1) + # Single column to prevent overlap
  scale_y_reordered() +
  labs(title = "Top Actionable Bigrams by Classification", x = "TF-IDF Score", y = NULL) +
  theme(strip.text = element_text(size = 12, face = "bold"))

ggplotly(p1) %>% layout(showlegend = FALSE, margin = list(t = 50, b = 50, l = 150))
Code
# Sentiment Trend
sent_data <- tickets %>%
  mutate(sentiment = sentimentr::sentiment_by(get_sentences(Summary_Text))$ave_sentiment) %>%
  group_by(Created_Date) %>%
  summarise(avg_sent = mean(sentiment))

p2 <- ggplot(sent_data, aes(x = Created_Date, y = avg_sent)) +
  geom_line(color = "#1A73E8") +
  geom_smooth(method = "loess", color = "#F4A900") +
  labs(title = "Daily Average Sentiment Score Trend", x = "Date Submitted", y = "Avg Sentiment Score")

ggplotly(p2)
Code
# Prepare Document-Term Matrix using bigrams for better topic clarity
dtm_bigram <- ticket_bigrams %>%
  count(Id, bigram) %>%
  cast_dtm(Id, bigram, n)

# Fit LDA
lda_model <- LDA(dtm_bigram, k = 4, control = list(seed = 123))
topics <- tidy(lda_model, matrix = "beta")

# Map topics to descriptive names
topic_names <- c("1" = "Hardware Failure Modes", "2" = "Software & Sync Errors", "3" = "User Account & Access", "4" = "Power & Charging Issues")

top_terms <- topics %>%
  mutate(topic_name = topic_names[as.character(topic)]) %>%
  group_by(topic_name) %>%
  dplyr::slice_max(beta, n = 8) %>%
  ungroup() %>%
  arrange(topic_name, -beta)

p3 <- top_terms %>%
  mutate(term = reorder_within(term, beta, topic_name)) %>%
  ggplot(aes(beta, term, fill = topic_name)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~topic_name, scales = "free", ncol = 1) + # Single column to prevent overlap
  scale_y_reordered() +
  labs(title = "LDA Topic Modeling: Actionable Themes", x = "Beta (Probability)", y = NULL) +
  theme(strip.text = element_text(size = 12, face = "bold"))

ggplotly(p3) %>% layout(showlegend = FALSE, margin = list(t = 50, b = 50, l = 150))

Interpretation: By shifting to bigram analysis, we now see specific, actionable issues. For “E-Ink Tablet”, the primary concern is “frozen screen” and “touch unresponsive”, rather than just “screen”. The LDA model clearly separates “Hardware Failure Modes” from “Software & Sync Errors”, allowing management to direct resources toward either physical repairs or server-side fixes. The sentiment trend volatility indicates that users are most frustrated during periods of high “Sync Error” reports.

6. Monte Carlo Simulation

Theory: This section uses a technique called Monte Carlo simulation to predict how long it might take to resolve future complaints. We use historical data to understand the typical spread of resolution times, which are often skewed (meaning many issues are resolved quickly, but a few take a very long time).

Justification: Complaints resolution times are rarely “normal”; they are heavily skewed. By running thousands of simulations, we can estimate the probability of tickets exceeding certain resolution targets, helping us set more realistic service expectations.

Code
res_times <- tickets$Resolution_Days[tickets$Resolution_Days > 0]
fit <- fitdistr(res_times, "lognormal")

set.seed(123)
sims <- rlnorm(10000, fit$estimate["meanlog"], fit$estimate["sdlog"])

quants <- quantile(sims, probs = c(0.1, 0.5, 0.9))

p4 <- plot_ly(x = ~sims, type = "histogram", name = "Simulated Days", marker = list(color = "#1A73E8")) %>%
  add_segments(x = quants[1], xend = quants[1], y = 0, yend = 1000, name = "P10", line = list(color = "#F4A900", dash = "dash")) %>%
  add_segments(x = quants[2], xend = quants[2], y = 0, yend = 1000, name = "P50", line = list(color = "#0B1F3A", dash = "dash")) %>%
  add_segments(x = quants[3], xend = quants[3], y = 0, yend = 1000, name = "P90", line = list(color = "#E53935", dash = "dash")) %>%
  layout(title = "Monte Carlo: Resolution Time Value-at-Risk (VaR)", xaxis = list(title = "Days to Resolve"), yaxis = list(title = "Frequency"))

p4
Code
# Tornado Chart (Sensitivity)
base_p90 <- quantile(rlnorm(10000, fit$estimate["meanlog"], fit$estimate["sdlog"]), 0.9)
m_up <- quantile(rlnorm(10000, fit$estimate["meanlog"]*1.1, fit$estimate["sdlog"]), 0.9)
m_down <- quantile(rlnorm(10000, fit$estimate["meanlog"]*0.9, fit$estimate["sdlog"]), 0.9)
s_up <- quantile(rlnorm(10000, fit$estimate["meanlog"], fit$estimate["sdlog"]*1.1), 0.9)
s_down <- quantile(rlnorm(10000, fit$estimate["meanlog"], fit$estimate["sdlog"]*0.9), 0.9)

tornado_data <- tibble(
  Param = c("MeanLog", "MeanLog", "SDLog", "SDLog"),
  Change = c("+10%", "-10%", "+10%", "-10%"),
  Value = c(m_up, m_down, s_up, s_down) - base_p90
)

plot_ly(tornado_data, x = ~Value, y = ~Param, color = ~Change, type = "bar", orientation = "h", colors = "Set1") %>%
  layout(title = "Tornado Chart: Sensitivity of P90 Resolution Time", xaxis = list(title = "Change in P90 Days"), yaxis = list(title = "Model Parameter"))

Interpretation: This chart shows the likely range of resolution times based on our simulations. The vertical lines indicate key thresholds: P10 (10% of tickets resolve faster than this), P50 (half of tickets resolve faster than this), and P90 (90% of tickets resolve faster than this). The P90 is particularly important for understanding our worst-case resolution times.

7. Advanced Forecasting (Prophet)

Theory: This section uses a forecasting model called Prophet to predict future complaint volumes. This model is good at identifying trends and recurring patterns, like daily or weekly busy periods.

Justification: The goal is to help us anticipate future workload so we can staff our support teams effectively and prevent backlogs.

Code
df_prophet <- tickets %>%
  count(Created_Date) %>%
  rename(ds = Created_Date, y = n)

m <- prophet(df_prophet, daily.seasonality = TRUE)
future <- make_future_dataframe(m, periods = 90)
forecast <- predict(m, future)

p5 <- plot(m, forecast) + labs(title = "90-Day Ticket Volume Forecast", x = "Date", y = "Ticket Count")
ggplotly(p5)
Code
# CV
cv <- cross_validation(m, initial = 90, period = 30, horizon = 30, units = 'days')
metrics <- performance_metrics(cv)
knitr::kable(head(metrics), caption = "Prophet Forecast Accuracy Metrics (Walk-Forward CV)")
Prophet Forecast Accuracy Metrics (Walk-Forward CV)
horizon mse rmse mae mape mdape smape coverage
3 days 522.6203 22.86089 17.89918 1.572360 0.8966529 0.9196797 0.7692308
4 days 160.6948 12.67655 11.70673 1.543926 1.3768647 0.8042158 0.9230769
5 days 135.1777 11.62659 10.42474 1.346074 0.6411792 0.7058946 0.9230769
6 days 249.8677 15.80720 13.12909 3.065893 0.7858233 0.8240851 0.7884615
7 days 223.4787 14.94920 12.10100 2.865370 0.7858233 0.7584955 0.8461538
8 days 411.2683 20.27975 15.44961 2.873048 0.7858233 0.8006588 0.7692308

Interpretation: The forecast indicates a rising trend in ticket volume, with Mondays often being the busiest. This means we need to plan for increased demand and ensure we have enough staff, especially at the beginning of the week. The forecast’s accuracy (measured by RMSE) suggests it’s a reliable tool for short-term planning.

8. People Analytics (Survival Analysis)

Theory: This section uses “Survival Analysis” to understand how long tickets typically remain open before being resolved. In this context, “survival” means the probability that a ticket is still unresolved over time. Justification: By looking at this across different priority levels, we can see if urgent tickets are truly being handled faster and identify any bottlenecks.

Code
surv_df <- tickets %>%
  mutate(
    status_num = ifelse(Status %in% c("Resolved", "Closed"), 1, 0),
    time = Resolution_Days
  )

fit_km <- survfit(Surv(time, status_num) ~ Priority, data = surv_df)

p6 <- ggsurvplot(fit_km, data = surv_df, conf.int = TRUE, palette = "Set1", legend = "bottom")$plot +
  labs(title = "Survival Curves: Probability of Ticket Remaining Open by Priority") +
  theme(legend.position = "bottom")

ggplotly(p6) %>% layout(legend = list(orientation = "h", x = 0.1, y = -0.2))

Interpretation: The survival curves show that our most urgent (Priority 1) tickets are generally resolved faster. However, the difference in resolution speed between priorities narrows after about 10 days. This suggests that complex high-priority issues might get stuck in the same bottlenecks as less urgent ones. This insight can help us refine our escalation processes and ensure that critical issues maintain their fast-track status throughout their lifecycle.

9. Association Rules (Apriori)

Theory: This section uses “Association Rule Mining” to discover hidden patterns in our complaint data. It helps us find rules like “If a complaint is about X and has Y priority, then it’s often handled by Agent Z.”

Justification: These rules can help us understand how different complaint characteristics (type, priority, assigned agent) are linked, leading to smarter ways to route tickets and improve our operations.

Code
# Prepare transactions
trans_data <- tickets %>%
  dplyr::select(Classification, Priority, Agent) %>%
  mutate(across(everything(), as.factor))

trans <- as(trans_data, "transactions")
rules <- apriori(trans, parameter = list(supp = 0.01, conf = 0.5, minlen = 2), control = list(verbose = FALSE))
rules_df <- as(sort(rules, by = "lift")[1:20], "data.frame")

datatable(rules_df, caption = "Top 20 Association Rules by Lift", options = list(pageLength = 5))
Code
# Scatter plot of rules
plot_ly(rules_df, x = ~support, y = ~confidence, size = ~lift, color = ~lift,
        text = ~paste("Rule: ", rules, "<br>Lift: ", round(lift, 2)),
        type = "scatter", mode = "markers") %>%
  layout(title = "Association Rules: Support vs Confidence (Sized by Lift)",
         xaxis = list(title = "Support"), yaxis = list(title = "Confidence"))

Interpretation: Rules with high ‘Lift’ (e.g., “If a complaint is about a Smartphone, it’s often assigned to Agent_5”) suggest that certain agents have become unofficial specialists. This information can be used to automatically route tickets to the most effective agents. The scatter plot helps us identify reliable patterns that can be used to improve our ticket handling processes.

10. Integrated Findings

Bringing all our analyses together, we see a clear picture: our complaints system is facing increasing pressure. Text analysis tells us what the common problems are (like hardware failures and sync issues). The Monte Carlo and Survival analyses quantify the risks associated with long resolution times. And the Prophet forecast warns us about future increases in workload.

Actionable Recommendation:

We recommend implementing a “Specialist Triage” workflow. This means using the patterns identified by our Association Rules to automatically send specific types of complaints to the agents best equipped to handle them. For example, if a rule says “Smartphone issues often go to Agent_5,” the system would route it there directly. Simultaneously, we should launch a targeted training program for issues related to “E-Ink Tablets” to reduce the high variability in their resolution times, as highlighted by the Monte Carlo simulation. This two-pronged approach will help us resolve complaints faster, more consistently, and prepare for future demand.

11. Limitations & Further Work

Limitations: This study used data from only four months, which might not capture seasonal trends that happen over a full year (like school holidays). Also, our “Resolution_Days” calculation doesn’t account for non-working days, which could slightly inflate resolution times.

Further Work: To make our analysis even stronger, we could include data on agent workload and the cost associated with resolving each ticket. This would allow us to optimize not just efficiency, but also our budget for the support department.

12. References

  • Adi, B. (2026). AI-Powered Data Analytics II. Lagos Business School.
  • Silge, J., & Robinson, D. (2017). Text Mining with R. O’Reilly Media.
  • Taylor, S. J., & Letham, B. (2018). Forecasting at Scale. The American Statistician.
  • Wickham, H., et al. (2019). Welcome to the Tidyverse. Journal of Open Source Software.

13. Appendix: AI Usage Statement

This document was created with the help of AI tools. An AI assisted in generating the R code for data analysis and visualizations, and also helped draft some of the explanations. However, all critical decisions about which analyses to perform, how to interpret the results, and the final recommendations were made by me, Nnaemeka Onyebueke, based on my professional judgment and understanding of the project goals.

The AI acted as a powerful assistant to speed up the process, allowing me to focus more on the strategic insights.