Using Exploratory and Inferential Analytics to Improve Legal Service Delivery and Business Execution

Author

Osinachi Nwandem

Published

May 15, 2026

Code
# Install packages manually if not already installed:
# install.packages(c("tidyverse", "readxl", "janitor", "lubridate", "broom", "ggcorrplot", "knitr", "scales"))

library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(broom)
library(ggcorrplot)
library(knitr)
library(scales)

cramers_v_calc <- function(tbl) {
  test <- suppressWarnings(chisq.test(tbl))
  chi2 <- unname(test$statistic)
  n <- sum(tbl)
  r <- nrow(tbl)
  c <- ncol(tbl)
  sqrt(chi2 / (n * (min(r - 1, c - 1))))
}

1. Executive Summary

This case study applies exploratory and inferential analytics to 123 anonymised legal requests handled by the Legal Department of Nigerian Breweries Plc between January 2025 and April 2026. The purpose of the analysis is to understand the operational factors associated with legal request delays and to identify practical interventions that can improve legal service delivery and business execution.

The analysis shows that 67 out of 123 requests were marked as delayed, representing a 54.5% delay rate. However, for the 85 records usable for turnaround analysis, the median turnaround was 3 days while the average turnaround was 9.7 days. This indicates that many requests are completed quickly, but a smaller number of complex or dependency-heavy matters extend the overall average.

The hypothesis tests did not show that request type, external input or workload concentration independently explain delay at the 5% significance level. However, the logistic regression showed that review rounds are a statistically significant predictor of delay. This suggests that repeated review cycles, incomplete instructions, stakeholder back-and-forth and rework are important operational delay drivers.

The recommended response is a three-pillar framework: a structured triage and tiering model, a first-time-right intake protocol and a support-and-supervise capacity model using NYSC lawyers or fixed-term contract counsel for supervised support work in 2026.

2. Professional Disclosure and Operational Relevance

I work in the Legal Department of a large FMCG company operating in Nigeria. The Legal Department supports business execution across contracts, procurement, sales, supply chain, regulatory matters, employment, governance, litigation and property-related issues. Legal service delivery therefore affects how quickly business teams can contract, manage risk, respond to disputes, engage regulators, obtain approvals and execute commercial priorities.

The dataset used in this case study is based on anonymised legal requests handled within the department. Confidential names, vendor identities, employee names, contract values and privileged matter details were removed or generalised. The analysis is not intended to score individual lawyers. It is intended to understand operational patterns, delay drivers and opportunities to improve legal service delivery.

2.1 Technique Justification

Exploratory Data Analysis (EDA).
EDA is appropriate because the first question is operational: what type of legal work enters the department, where does demand come from, how often are matters delayed, and what does turnaround distribution look like? EDA helps identify workload concentration, missing completion dates, outliers and skewness before any formal testing.

Data Visualisation.
Visualisation is appropriate because legal service delivery is easier to understand when demand patterns, delay rates and turnaround distributions are shown visually. Charts help management see where work comes from, which request categories dominate, and where delay pressure is concentrated.

Hypothesis Testing.
Hypothesis testing is appropriate because the department needs evidence before drawing conclusions about delay drivers. It helps test whether delay is associated with request type, review rounds, external dependencies and workload concentration.

Correlation Analysis.
Correlation analysis is appropriate because several variables may move together. For example, review rounds, external input, management approval and delay status may all relate to turnaround. Correlation helps identify which relationships are strongest, while still recognising that correlation does not prove causation.

Logistic Regression.
Logistic regression is appropriate because the key outcome variable, delay status, is binary: delayed or not delayed. The model helps estimate how review rounds, external input, management approval and workload concentration relate to the likelihood of delay when considered together.

3. Data Collection and Sampling

The dataset contains 123 anonymised legal requests received between January 2025 and April 2026. The data was consolidated from legal department records and email-based matter tracking. The sampling frame covers legal requests across key workstreams, including contract review, contract drafting, litigation and dispute support, governance support, regulatory support, property support, HR/employment support and general legal advisory.

The period January 2025 to April 2026 was selected because it provides more than one calendar year of legal work and gives sufficient observations to meet the case study requirement. It also captures both routine and non-routine legal work across several business cycles.

The dataset contains more than 100 observations and more than six variables, including date fields, categorical variables and numeric variables. Key fields include request date, completion date, request type, requesting department, responsible lawyer, external input required, management approval required, review rounds, delay status and turnaround days.

The data was anonymised before analysis. Matter descriptions were kept broad, and confidential details such as vendor names, employee names, sensitive financial values and privileged legal opinions were removed. The dataset is suitable for this academic exercise because it reflects real operational activity while protecting confidentiality.

Code
data_file <- "Legal Requests.xlsx"

raw_data <- read_excel(data_file, sheet = "Cleaned Master Dataset")

legal_data <- raw_data %>%
  clean_names() %>%
  mutate(
    appeared_delayed_flag = if_else(appeared_delayed == "Yes", 1, 0),
    external_input_flag = if_else(external_input_required == "Yes", 1, 0),
    management_approval_flag = if_else(management_approval_required == "Yes", 1, 0),
    osinachi_group = if_else(responsible_lawyer == "Osinachi", "Senior resource", "Others"),
    osinachi_flag = if_else(responsible_lawyer == "Osinachi", 1, 0),
    turnaround_days = as.numeric(turnaround_days),
    review_rounds_numeric = as.numeric(review_rounds_numeric),
    month_request_came_in = as.character(month_request_came_in)
  )

turnaround_data <- legal_data %>%
  filter(use_for_turnaround_analysis == "Yes", !is.na(turnaround_days))

nrow(legal_data)
[1] 123
Code
nrow(turnaround_data)
[1] 85

4. Data Description

Code
summary_table <- tibble(
  Metric = c(
    "Total records",
    "Records usable for turnaround analysis",
    "Records marked delayed",
    "Records marked not delayed",
    "Delay rate",
    "Average turnaround days",
    "Median turnaround days",
    "Minimum turnaround days",
    "Maximum turnaround days"
  ),
  Value = c(
    nrow(legal_data),
    nrow(turnaround_data),
    sum(legal_data$appeared_delayed == "Yes", na.rm = TRUE),
    sum(legal_data$appeared_delayed == "No", na.rm = TRUE),
    percent(mean(legal_data$appeared_delayed == "Yes", na.rm = TRUE), accuracy = 0.1),
    round(mean(turnaround_data$turnaround_days, na.rm = TRUE), 1),
    median(turnaround_data$turnaround_days, na.rm = TRUE),
    min(turnaround_data$turnaround_days, na.rm = TRUE),
    max(turnaround_data$turnaround_days, na.rm = TRUE)
  )
)

kable(summary_table, caption = "Summary of the Legal Request Dataset")
Summary of the Legal Request Dataset
Metric Value
Total records 123
Records usable for turnaround analysis 85
Records marked delayed 67
Records marked not delayed 56
Delay rate 54.5%
Average turnaround days 9.7
Median turnaround days 3
Minimum turnaround days 0
Maximum turnaround days 59

The dataset contains 123 records. Of these, 85 records are usable for turnaround-day analysis because they have sufficiently clear completion dates and date consistency. There are 67 delayed matters and 56 non-delayed matters, giving an overall delay rate of 54.5%. Average turnaround is about 9.7 days, but median turnaround is 3 days. This difference between the mean and median shows that turnaround time is right-skewed: many matters are completed quickly, while a smaller number take much longer and pull the average upward.

Code
request_type_summary <- legal_data %>%
  count(clean_request_type, sort = TRUE) %>%
  rename(Request_Type = clean_request_type, Count = n)

kable(request_type_summary, caption = "Request Volume by Request Type")
Request Volume by Request Type
Request_Type Count
Contract review 26
Litigation/dispute support 20
Governance/company secretarial support 17
Regulatory support 16
Contract drafting 14
Real estate/property support 9
Contract renewal 6
Legal advisory 5
Contract execution/approval 4
HR/employment support 4
Contract administration 1
Contract administration support 1
Code
department_summary <- legal_data %>%
  count(clean_requesting_department, sort = TRUE) %>%
  rename(Requesting_Department = clean_requesting_department, Count = n)

kable(head(department_summary, 12), caption = "Top Requesting Departments")
Top Requesting Departments
Requesting_Department Count
Supply Chain 19
Procurement 18
Legal/Internal 10
Finance 9
Regulatory/Compliance 9
Marketing 8
Operations/Supply Chain 8
Sales 8
HR 4
Pensions subsidiary 3
Sales/Commercial 3
Corporate Affairs 2
Code
lawyer_summary <- legal_data %>%
  count(responsible_lawyer, sort = TRUE) %>%
  mutate(Percentage = percent(n / sum(n), accuracy = 0.1)) %>%
  rename(Responsible_Lawyer = responsible_lawyer, Count = n)

kable(lawyer_summary, caption = "Responsible Lawyer Workload Distribution")
Responsible Lawyer Workload Distribution
Responsible_Lawyer Count Percentage
Osinachi 60 48.8%
Multiple lawyers 18 14.6%
Ibironke 16 13.0%
Babatope 13 10.6%
Peter 10 8.1%
Victoria 4 3.3%
Legal (copied) 1 0.8%
Oghenekevwe 1 0.8%

The responsible lawyer distribution is used only to examine workload concentration. It is not used as individual performance scoring because the complexity and sensitivity of matters may differ across lawyers.

5. Technique 1: Exploratory Data Analysis

5.1 Data Quality Review

Code
data_quality <- legal_data %>%
  summarise(
    total_records = n(),
    missing_completion_dates = sum(is.na(date_legal_completed_work)),
    records_excluded_from_turnaround = sum(use_for_turnaround_analysis == "No"),
    unclear_delay_flags = sum(appeared_delayed == "Not clear", na.rm = TRUE),
    data_quality_ok = sum(data_quality_note == "OK", na.rm = TRUE)
  )

kable(data_quality, caption = "Data Quality Checks")
Data Quality Checks
total_records missing_completion_dates records_excluded_from_turnaround unclear_delay_flags data_quality_ok
123 38 38 0 123

The data quality review shows two important points. First, delay status is now clean because the dataset no longer contains unclear delay flags. Second, 38 records are excluded from turnaround-day analysis because they are still open, incomplete or do not have a usable completion timeline. These records remain relevant for understanding workload and delay status, but they should not be used when calculating turnaround days.

5.2 Turnaround Distribution and Outliers

Code
turnaround_summary <- turnaround_data %>%
  summarise(
    mean_turnaround = mean(turnaround_days),
    median_turnaround = median(turnaround_days),
    sd_turnaround = sd(turnaround_days),
    min_turnaround = min(turnaround_days),
    max_turnaround = max(turnaround_days),
    q1 = quantile(turnaround_days, 0.25),
    q3 = quantile(turnaround_days, 0.75)
  )

kable(turnaround_summary, caption = "Turnaround Days Summary")
Turnaround Days Summary
mean_turnaround median_turnaround sd_turnaround min_turnaround max_turnaround q1 q3
9.741176 3 14.65975 0 59 1 11

The turnaround distribution is skewed. The median of 3 days shows that many matters move quickly. However, the maximum of 59 days and the higher mean show that a smaller number of long-running matters create extended completion timelines. This suggests that the operational focus should not only be on average turnaround. The department should identify and manage matters that become outliers.

6. Technique 2: Data Visualisation

The visualisation section tells one story: legal delay is not evenly distributed across all work. It appears more strongly where matters involve complexity, review cycles, approvals, dependencies or senior-resource concentration.

6.1 Request Volume by Type

Code
legal_data %>%
  count(clean_request_type, sort = TRUE) %>%
  ggplot(aes(x = reorder(clean_request_type, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Legal Request Volume by Request Type",
    x = "Request Type",
    y = "Number of Requests"
  ) +
  theme_minimal()

This chart shows that contract review, litigation/dispute support, governance/company secretarial support, regulatory support and contract drafting make up the largest categories of legal work. The department is therefore supporting both routine business execution and strategic risk management.

6.2 Request Volume by Department

Code
legal_data %>%
  count(clean_requesting_department, sort = TRUE) %>%
  slice_max(n, n = 12) %>%
  ggplot(aes(x = reorder(clean_requesting_department, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top Requesting Departments",
    x = "Requesting Department",
    y = "Number of Requests"
  ) +
  theme_minimal()

This chart shows that demand is concentrated in business execution functions, especially Supply Chain and Procurement. This confirms that legal turnaround is linked to commercial and operational execution.

6.3 Turnaround Days Distribution

Code
turnaround_data %>%
  ggplot(aes(x = turnaround_days)) +
  geom_histogram(binwidth = 5, boundary = 0) +
  labs(
    title = "Distribution of Turnaround Days",
    x = "Turnaround Days",
    y = "Number of Matters"
  ) +
  theme_minimal()

This chart shows that most matters are completed within a short period, while a smaller number remain open for longer. This reinforces the point that the department has a tail-risk issue rather than a uniform delay problem.

6.4 Delay Rate by Request Type

Code
legal_data %>%
  group_by(clean_request_type) %>%
  summarise(
    count = n(),
    delay_rate = mean(appeared_delayed == "Yes"),
    .groups = "drop"
  ) %>%
  ggplot(aes(x = reorder(clean_request_type, delay_rate), y = delay_rate)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = percent_format()) +
  labs(
    title = "Delay Rate by Request Type",
    x = "Request Type",
    y = "Delay Rate"
  ) +
  theme_minimal()

This chart shows that some request types appear more delay-prone than others. However, the hypothesis test below shows that request type alone does not statistically explain delay at the 5% significance level.

6.5 Workload Distribution by Responsible Lawyer

Code
legal_data %>%
  count(responsible_lawyer, sort = TRUE) %>%
  ggplot(aes(x = reorder(responsible_lawyer, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Workload Distribution by Responsible Lawyer",
    x = "Responsible Lawyer",
    y = "Number of Requests"
  ) +
  theme_minimal()

The workload chart shows that 60 out of 123 matters, representing 48.8% of all requests, were handled by one senior legal resource. This should not be read as an individual performance score. It should be read as evidence of workload concentration and potential capacity risk.

6.6 Turnaround by Workload Group

Code
turnaround_data %>%
  ggplot(aes(x = osinachi_group, y = turnaround_days)) +
  geom_boxplot() +
  labs(
    title = "Turnaround Days: Senior Resource vs Others",
    x = "Workload Group",
    y = "Turnaround Days"
  ) +
  theme_minimal()

This chart allows us to compare turnaround for matters handled by the senior resource against all other matters. The chart should be interpreted with care because the senior resource may be handling more complex, sensitive or approval-heavy matters.

7. Technique 3: Hypothesis Testing

7.1 Purpose of the Hypothesis Tests

The hypothesis tests were used to examine whether selected operational factors are statistically associated with legal request delay. The tests were not designed to assign blame to any lawyer, department or requester. They were designed to identify whether delay is linked to request type, review cycles, external dependencies or workload concentration.

Because the dependent variable for most tests is categorical, chi-square tests were used where the outcome was delay status. Spearman correlation was used for review rounds and turnaround days because turnaround days are skewed and review rounds are ordinal/numeric. Mann-Whitney tests were used where turnaround days were compared across two groups.

7.2 Hypothesis 1: Request Type and Delay

Business question: Are some request types more likely to be delayed than others?

Null hypothesis (H0): Delay status is independent of request type.
Alternative hypothesis (H1): Delay status differs by request type.

Test used: Chi-square test of independence.

Code
h1_assumptions <- tibble(
  Assumption = c(
    "Observations are independent",
    "Variables are categorical",
    "Expected cell counts are adequate",
    "Test purpose is association, not causation"
  ),
  Assessment = c(
    "Each row represents a separate legal request.",
    "Request type and delay status are categorical.",
    "Some request categories have low counts, so the result should be interpreted cautiously.",
    "The test only checks whether request type and delay status are associated."
  )
)

kable(h1_assumptions, caption = "Assumption Check: Request Type and Delay")
Assumption Check: Request Type and Delay
Assumption Assessment
Observations are independent Each row represents a separate legal request.
Variables are categorical Request type and delay status are categorical.
Expected cell counts are adequate Some request categories have low counts, so the result should be interpreted cautiously.
Test purpose is association, not causation The test only checks whether request type and delay status are associated.
Code
h1_table <- table(legal_data$clean_request_type, legal_data$appeared_delayed)
h1_test <- suppressWarnings(chisq.test(h1_table))
h1_effect <- cramers_v_calc(h1_table)

h1_results <- tibble(
  Test = "Chi-square test of independence",
  Statistic = round(unname(h1_test$statistic), 3),
  P_Value = round(h1_test$p.value, 3),
  Effect_Size_Cramers_V = round(h1_effect, 3),
  Decision = "Not significant at 5%"
)

kable(h1_results, caption = "Hypothesis 1 Result: Request Type and Delay")
Hypothesis 1 Result: Request Type and Delay
Test Statistic P_Value Effect_Size_Cramers_V Decision
Chi-square test of independence 13.099 0.287 0.326 Not significant at 5%

The p-value is above 0.05. This means the data does not provide statistically significant evidence that request type alone explains delay. However, the effect size suggests a moderate association, and the descriptive results show that some request categories appear more delay-prone than others. The more realistic interpretation is that delay is not driven by request type in isolation. It is likely driven by the complexity, dependencies, approvals and stakeholder interactions that sit behind some request types.

7.3 Hypothesis 2: Review Rounds and Turnaround Time

Business question: Do matters with more review rounds take longer to complete?

Null hypothesis (H0): There is no relationship between review rounds and turnaround days.
Alternative hypothesis (H1): More review rounds are associated with longer turnaround days.

Test used: Spearman correlation.

Code
h2_assumptions <- tibble(
  Assumption = c(
    "Variables are ordinal or numeric",
    "Relationship is monotonic",
    "Normality is not required",
    "Outliers are possible"
  ),
  Assessment = c(
    "Review rounds numeric and turnaround days are numeric.",
    "The business logic suggests that more review rounds may increase turnaround time.",
    "Spearman correlation is appropriate because turnaround days are skewed.",
    "Some matters have very long turnaround times, so Spearman is preferred over Pearson."
  )
)

kable(h2_assumptions, caption = "Assumption Check: Review Rounds and Turnaround Time")
Assumption Check: Review Rounds and Turnaround Time
Assumption Assessment
Variables are ordinal or numeric Review rounds numeric and turnaround days are numeric.
Relationship is monotonic The business logic suggests that more review rounds may increase turnaround time.
Normality is not required Spearman correlation is appropriate because turnaround days are skewed.
Outliers are possible Some matters have very long turnaround times, so Spearman is preferred over Pearson.
Code
h2_data <- turnaround_data %>%
  filter(!is.na(review_rounds_numeric), !is.na(turnaround_days))

h2_test <- cor.test(
  h2_data$review_rounds_numeric,
  h2_data$turnaround_days,
  method = "spearman",
  exact = FALSE
)

h2_results <- tibble(
  Test = "Spearman correlation",
  Spearman_Rho = round(unname(h2_test$estimate), 3),
  P_Value = round(h2_test$p.value, 3),
  Records_Used = nrow(h2_data),
  Decision = "Weak positive relationship, not significant at 5%"
)

kable(h2_results, caption = "Hypothesis 2 Result: Review Rounds and Turnaround Time")
Hypothesis 2 Result: Review Rounds and Turnaround Time
Test Spearman_Rho P_Value Records_Used Decision
Spearman correlation 0.189 0.087 83 Weak positive relationship, not significant at 5%

The Spearman correlation is positive, meaning that matters with more review rounds tend to have longer turnaround days. However, the p-value is above 0.05, so the relationship is not statistically significant at the 5% level. This result should not be ignored because it provides a weak signal that review rounds may contribute to operational delay. It becomes more important when considered alongside the logistic regression, where review rounds later emerge as a significant predictor of delay.

7.4 Hypothesis 3: External Input and Delay

Business question: Are matters requiring external input more likely to be delayed?

Null hypothesis (H0): Delay status is independent of whether external input is required.
Alternative hypothesis (H1): Matters requiring external input are more likely to be delayed.

Tests used: Chi-square test and Mann-Whitney U test.

Code
h3_assumptions <- tibble(
  Assumption = c(
    "Observations are independent",
    "Variables are properly classified",
    "Delay status is categorical",
    "Turnaround days are skewed"
  ),
  Assessment = c(
    "Each request is treated as a separate matter.",
    "External input is coded as Yes or No.",
    "Delay status is coded as Yes or No.",
    "Mann-Whitney is appropriate for comparing turnaround days across groups."
  )
)

kable(h3_assumptions, caption = "Assumption Check: External Input and Delay")
Assumption Check: External Input and Delay
Assumption Assessment
Observations are independent Each request is treated as a separate matter.
Variables are properly classified External input is coded as Yes or No.
Delay status is categorical Delay status is coded as Yes or No.
Turnaround days are skewed Mann-Whitney is appropriate for comparing turnaround days across groups.
Code
h3_table <- table(legal_data$external_input_required, legal_data$appeared_delayed)
h3_chi <- suppressWarnings(chisq.test(h3_table))
h3_effect <- cramers_v_calc(h3_table)

h3_turnaround <- turnaround_data %>%
  filter(external_input_required %in% c("Yes", "No"))

h3_mw <- wilcox.test(turnaround_days ~ external_input_required, data = h3_turnaround)

h3_results <- tibble(
  Test = c("Chi-square: external input vs delay", "Mann-Whitney: external input vs turnaround"),
  P_Value = c(round(h3_chi$p.value, 3), round(h3_mw$p.value, 3)),
  Effect_Size = c(round(h3_effect, 3), NA_real_),
  Decision = c("Not significant at 5%", "Not significant at 5%")
)

kable(h3_results, caption = "Hypothesis 3 Result: External Input and Delay")
Hypothesis 3 Result: External Input and Delay
Test P_Value Effect_Size Decision
Chi-square: external input vs delay 0.245 0.105 Not significant at 5%
Mann-Whitney: external input vs turnaround 0.421 NA Not significant at 5%

The results do not show statistically significant evidence that external input alone causes delay. This means that the presence of a vendor, counterparty, regulator or external counsel does not automatically explain whether a matter is delayed. However, external input remains operationally relevant because it introduces dependency risk. Legal may not fully control the timeline once a matter depends on third-party responses or external approvals.

7.5 Hypothesis 4: Workload Concentration and Delay

Business question: Does concentration of work in one senior legal resource increase delay risk or turnaround time?

Null hypothesis (H0): Matters handled by the high-workload senior resource do not have higher delay rates or longer turnaround times than matters handled by others.
Alternative hypothesis (H1): Matters handled by the high-workload senior resource have higher delay rates or longer turnaround times than matters handled by others.

Tests used: Chi-square test and Mann-Whitney U test.

Code
h4_assumptions <- tibble(
  Assumption = c(
    "Observations are independent",
    "Groups are properly defined",
    "Delay status is categorical",
    "Turnaround days are skewed",
    "Workload is not performance scoring"
  ),
  Assessment = c(
    "Each row represents a separate request.",
    "Matters were grouped into senior-resource matters and others.",
    "Delay status is coded as Yes or No.",
    "Mann-Whitney is appropriate for group comparison.",
    "Responsible lawyer is used to understand capacity distribution, not individual performance."
  )
)

kable(h4_assumptions, caption = "Assumption Check: Workload Concentration and Delay")
Assumption Check: Workload Concentration and Delay
Assumption Assessment
Observations are independent Each row represents a separate request.
Groups are properly defined Matters were grouped into senior-resource matters and others.
Delay status is categorical Delay status is coded as Yes or No.
Turnaround days are skewed Mann-Whitney is appropriate for group comparison.
Workload is not performance scoring Responsible lawyer is used to understand capacity distribution, not individual performance.
Code
h4_table <- table(legal_data$osinachi_group, legal_data$appeared_delayed)
h4_chi <- suppressWarnings(chisq.test(h4_table))
h4_effect <- cramers_v_calc(h4_table)

h4_turnaround <- turnaround_data %>%
  filter(osinachi_group %in% c("Senior resource", "Others"))

h4_mw <- wilcox.test(turnaround_days ~ osinachi_group, data = h4_turnaround)

h4_descriptive <- h4_turnaround %>%
  group_by(osinachi_group) %>%
  summarise(
    records = n(),
    average_turnaround = round(mean(turnaround_days), 1),
    median_turnaround = median(turnaround_days),
    .groups = "drop"
  )

h4_results <- tibble(
  Test = c("Chi-square: workload group vs delay", "Mann-Whitney: workload group vs turnaround"),
  P_Value = c(round(h4_chi$p.value, 3), round(h4_mw$p.value, 3)),
  Effect_Size = c(round(h4_effect, 3), NA_real_),
  Decision = c("Not significant at 5%", "Not significant at 5%")
)

kable(h4_results, caption = "Hypothesis 4 Result: Workload Concentration and Delay")
Hypothesis 4 Result: Workload Concentration and Delay
Test P_Value Effect_Size Decision
Chi-square: workload group vs delay 1.000 0 Not significant at 5%
Mann-Whitney: workload group vs turnaround 0.897 NA Not significant at 5%
Code
kable(h4_descriptive, caption = "Turnaround by Workload Group")
Turnaround by Workload Group
osinachi_group records average_turnaround median_turnaround
Others 38 6.6 2.5
Senior resource 47 12.3 4.0

The statistical tests do not prove that workload concentration alone causes delay. The delay rate for the senior resource is close to that of the rest of the team. However, the descriptive data is still important. One senior legal resource handled 60 out of 123 requests, representing 48.8% of all matters. The average turnaround for that group is also higher than for others. This suggests a concentration risk, especially where the senior resource is likely to handle complex, sensitive, high-stakes or approval-heavy matters.

7.6 Summary of Hypothesis Test Results

Code
hypothesis_summary <- tibble(
  Hypothesis = c(
    "H1: Request type and delay",
    "H2: Review rounds and turnaround",
    "H3: External input and delay",
    "H4: Workload concentration and delay"
  ),
  Main_Test = c(
    "Chi-square",
    "Spearman correlation",
    "Chi-square and Mann-Whitney",
    "Chi-square and Mann-Whitney"
  ),
  Main_Result = c(
    "Not statistically significant at 5%",
    "Weak positive relationship, not significant at 5%",
    "Not statistically significant at 5%",
    "Not statistically significant at 5%"
  ),
  Business_Reading = c(
    "Request type alone does not explain delay; hidden complexity may matter more.",
    "More review rounds appear to add friction, but do not fully explain turnaround.",
    "External dependency matters operationally, but does not independently explain delay.",
    "Workload concentration is a capacity risk, but must be interpreted with matter complexity."
  )
)

kable(hypothesis_summary, caption = "Summary of Hypothesis Tests")
Summary of Hypothesis Tests
Hypothesis Main_Test Main_Result Business_Reading
H1: Request type and delay Chi-square Not statistically significant at 5% Request type alone does not explain delay; hidden complexity may matter more.
H2: Review rounds and turnaround Spearman correlation Weak positive relationship, not significant at 5% More review rounds appear to add friction, but do not fully explain turnaround.
H3: External input and delay Chi-square and Mann-Whitney Not statistically significant at 5% External dependency matters operationally, but does not independently explain delay.
H4: Workload concentration and delay Chi-square and Mann-Whitney Not statistically significant at 5% Workload concentration is a capacity risk, but must be interpreted with matter complexity.

The hypothesis tests show that delay is not explained by one variable in isolation. This is a useful finding. It suggests that delay is a system issue rather than a single-factor problem. The strongest pattern is that delays appear where complexity, review cycles, approvals, external dependencies and workload concentration overlap.

8. Technique 4: Correlation Analysis

8.1 Purpose of the Correlation Analysis

Correlation analysis was used to examine how the numeric and binary-coded operational variables move together. The analysis focused on delay status, turnaround days, review rounds, external input, management approval and workload concentration.

The purpose was not to prove causation. The purpose was to identify relationships that may help explain the operating pattern behind legal request delays.

Code
correlation_data <- legal_data %>%
  transmute(
    delay_flag = appeared_delayed_flag,
    turnaround_days = turnaround_days,
    review_rounds_numeric = review_rounds_numeric,
    external_input_flag = external_input_flag,
    management_approval_flag = management_approval_flag,
    senior_resource_flag = osinachi_flag
  )

cor_matrix <- correlation_data %>%
  select(where(is.numeric)) %>%
  cor(use = "pairwise.complete.obs", method = "spearman")

ggcorrplot(
  cor_matrix,
  lab = TRUE,
  type = "lower",
  title = "Spearman Correlation Heatmap"
)

Code
cor_pairs <- as.data.frame(as.table(cor_matrix)) %>%
  filter(as.character(Var1) < as.character(Var2)) %>%
  mutate(abs_corr = abs(Freq)) %>%
  arrange(desc(abs_corr)) %>%
  slice_head(n = 6) %>%
  rename(Variable_1 = Var1, Variable_2 = Var2, Correlation = Freq)

kable(cor_pairs, caption = "Strongest Correlations")
Strongest Correlations
Variable_1 Variable_2 Correlation abs_corr
delay_flag turnaround_days 0.5911323 0.5911323
external_input_flag review_rounds_numeric -0.2146880 0.2146880
review_rounds_numeric turnaround_days 0.1967898 0.1967898
delay_flag review_rounds_numeric 0.1803123 0.1803123
delay_flag external_input_flag 0.1214440 0.1214440
external_input_flag turnaround_days 0.1123186 0.1123186

The correlation matrix and heatmap show that no single variable has a very strong relationship with delay or turnaround. This is consistent with the hypothesis tests. The data points to a multi-factor delay environment rather than a single dominant cause.

The most practically relevant relationships are those involving review rounds, delay status, turnaround days, external input and management approval. These variables reflect the operational reality that legal turnaround is affected by both the legal work itself and the wider business process around the work.

9. Technique 5: Logistic Regression

9.1 Purpose of the Model

Logistic regression was used because the main outcome variable is binary: a request either appeared delayed or did not appear delayed. The model tested whether review rounds, external input, management approval and workload concentration predict delay when considered together.

Code
model_data <- legal_data %>%
  filter(!is.na(review_rounds_numeric)) %>%
  mutate(
    external_input_flag = as.numeric(external_input_flag),
    management_approval_flag = as.numeric(management_approval_flag),
    senior_resource_flag = as.numeric(osinachi_flag)
  )

delay_model <- glm(
  appeared_delayed_flag ~ review_rounds_numeric + external_input_flag +
    management_approval_flag + senior_resource_flag,
  data = model_data,
  family = binomial
)

model_results <- tidy(delay_model, exponentiate = TRUE, conf.int = TRUE) %>%
  filter(term != "(Intercept)") %>%
  mutate(
    estimate = round(estimate, 3),
    conf.low = round(conf.low, 3),
    conf.high = round(conf.high, 3),
    p.value = round(p.value, 3)
  ) %>%
  rename(
    Predictor = term,
    Odds_Ratio = estimate,
    Conf_Low = conf.low,
    Conf_High = conf.high,
    P_Value = p.value
  )

kable(model_results, caption = "Logistic Regression Results: Predicting Delay")
Logistic Regression Results: Predicting Delay
Predictor Odds_Ratio std.error statistic P_Value Conf_Low Conf_High
review_rounds_numeric 2.564 0.4031973 2.3350100 0.020 1.230 6.168
external_input_flag 1.997 0.3993084 1.7323926 0.083 0.924 4.453
management_approval_flag 2.030 0.4657472 1.5205829 0.128 0.821 5.168
senior_resource_flag 0.927 0.3877070 -0.1950649 0.845 0.431 1.981
Code
model_fit <- glance(delay_model)
kable(model_fit, caption = "Logistic Regression Model Fit")
Logistic Regression Model Fit
null.deviance df.null logLik AIC BIC deviance df.residual nobs
164.2877 118 -76.96408 163.9282 177.8238 153.9282 114 119

9.2 Diagnostics and Model Adequacy

Code
model_data <- model_data %>%
  mutate(
    predicted_probability = predict(delay_model, type = "response"),
    predicted_class = if_else(predicted_probability >= 0.5, 1, 0)
  )

confusion_table <- table(
  Actual = model_data$appeared_delayed_flag,
  Predicted = model_data$predicted_class
)

confusion_df <- as.data.frame(confusion_table)
kable(confusion_df, caption = "Classification Table at 0.50 Threshold")
Classification Table at 0.50 Threshold
Actual Predicted Freq
0 0 24
1 0 11
0 1 31
1 1 53
Code
accuracy <- mean(model_data$appeared_delayed_flag == model_data$predicted_class)
tibble(Metric = "Classification accuracy", Value = percent(accuracy, accuracy = 0.1)) %>%
  kable(caption = "Simple Model Classification Accuracy")
Simple Model Classification Accuracy
Metric Value
Classification accuracy 64.7%
Code
predictor_cor <- model_data %>%
  select(review_rounds_numeric, external_input_flag, management_approval_flag, senior_resource_flag) %>%
  cor(use = "pairwise.complete.obs", method = "spearman")

ggcorrplot(
  predictor_cor,
  lab = TRUE,
  type = "lower",
  title = "Predictor Correlation Screen"
)

The regression diagnostics are interpreted cautiously because the dataset is modest in size. The model is useful for operational insight, not for final proof of causality. The binary outcome makes logistic regression appropriate. The predictor correlation screen helps check that the model is not simply repeating the same predictor under different names. The classification table gives a basic sense of how well the model separates delayed from non-delayed matters at a 0.50 threshold.

9.3 Regression Interpretation

The regression should be interpreted in odds ratios. An odds ratio above 1 means the variable is associated with higher odds of delay. An odds ratio below 1 means the variable is associated with lower odds of delay.

Review rounds provide the clearest statistical signal in the analysis. The odds ratio is above 1 and the p-value is below 0.05. This suggests that higher review rounds increase the odds that a matter will be delayed, after accounting for external input, management approval and workload group.

External input and management approval also have odds ratios above 1, which supports the business logic that dependencies and approvals increase delay risk. However, their p-values do not reach the 5% level. The workload concentration variable does not show a statistically significant direct effect in the model, which reinforces the need to treat workload as a capacity risk rather than as a standalone proven cause of delay.

The practical interpretation is that delay is most likely a multi-factor operational issue. Matters become slower when complexity, repeated review cycles, approvals, dependencies and senior-resource concentration overlap.

10. Integrated Findings and Recommendations

The five analytical techniques point to one integrated conclusion: legal request delay is not caused by one simple factor. It is a system issue.

EDA showed that the department handled 123 requests, with a 54.5% delay rate. It also showed that turnaround days are skewed, with a median turnaround of 3 days and an average turnaround of 9.7 days. This means many matters are completed quickly, but some long-running matters pull the average upward.

Visualisation showed that demand is concentrated around contract review, litigation/dispute support, governance/company secretarial support, regulatory support and contract drafting. It also showed that workload is concentrated, with one senior legal resource handling 48.8% of all matters.

The hypothesis tests showed that request type, external input and workload concentration do not independently explain delay at the 5% significance level. This means the department should avoid simplistic conclusions.

Correlation analysis showed that the relationships among the variables are moderate or weak, which again supports the view that no single operational factor fully explains delay.

The logistic regression produced the strongest statistical result. Review rounds were a significant predictor of delay. This suggests that repeated review cycles, incomplete instructions, stakeholder back-and-forth and rework are important drivers of delay.

Taken together, the evidence supports a recommendation focused on triage, intake discipline and capacity support.

10.1 Findings-to-Recommendations Bridge

Code
findings_bridge <- tibble(
  Analytical_Finding = c(
    "Delay is material",
    "Turnaround is skewed",
    "Request type alone does not explain delay",
    "Review rounds increase delay risk",
    "External input is not independently significant but creates dependency risk",
    "Workload concentration is high",
    "Workload concentration does not statistically prove delay",
    "Management approval is an operational friction point"
  ),
  Evidence = c(
    "67 out of 123 requests delayed, representing 54.5%",
    "Mean turnaround is 9.7 days, median is 3 days",
    "H1 p-value is above 0.05",
    "Logistic regression shows review rounds as a significant predictor",
    "H3 not significant, but dependency remains operationally relevant",
    "One senior resource handled 48.8% of all matters",
    "H4 not significant",
    "Regression odds ratio is above 1, though not significant at 5%"
  ),
  Recommendation_Link = c(
    "Legal needs a structured operating model, not informal request handling",
    "Focus on complex/outlier matters rather than treating all matters the same",
    "Use triage based on complexity, urgency and dependency, not request label alone",
    "Introduce a first-time-right intake protocol",
    "Build escalation triggers for external dependencies",
    "Introduce support-and-supervise capacity model",
    "Do not request permanent headcount purely on this test; use temporary support and monitor 2026 data",
    "Build approval status into matter triage and tracking"
  )
)

kable(findings_bridge, caption = "Findings-to-Recommendations Bridge")
Findings-to-Recommendations Bridge
Analytical_Finding Evidence Recommendation_Link
Delay is material 67 out of 123 requests delayed, representing 54.5% Legal needs a structured operating model, not informal request handling
Turnaround is skewed Mean turnaround is 9.7 days, median is 3 days Focus on complex/outlier matters rather than treating all matters the same
Request type alone does not explain delay H1 p-value is above 0.05 Use triage based on complexity, urgency and dependency, not request label alone
Review rounds increase delay risk Logistic regression shows review rounds as a significant predictor Introduce a first-time-right intake protocol
External input is not independently significant but creates dependency risk H3 not significant, but dependency remains operationally relevant Build escalation triggers for external dependencies
Workload concentration is high One senior resource handled 48.8% of all matters Introduce support-and-supervise capacity model
Workload concentration does not statistically prove delay H4 not significant Do not request permanent headcount purely on this test; use temporary support and monitor 2026 data
Management approval is an operational friction point Regression odds ratio is above 1, though not significant at 5% Build approval status into matter triage and tracking

10.2 Strategic Recommendations for Operational Efficiency

Recommendation 1: Implement a Structured Triage and Tiering Model

The Legal Department should adopt a structured traffic-light intake system to classify legal requests by complexity, urgency and dependency. This is preferable to applying one generic service timeline to all matters.

Green: Fast-Track Matters.
These are routine, low-risk and well-documented matters that can move quickly. Examples include simple confirmations, standard letters, basic document checks, low-risk contract administration and straightforward NDA-type reviews. These matters can have a short target turnaround, provided the requester submits complete information at the point of intake.

Amber: Standard Matters.
These are matters requiring moderate legal review, business input, external party responses or management approval. They should have defined target timelines, but with clear escalation triggers where required inputs are not received within agreed periods.

Red: Complex or Strategic Matters.
These are high-value, high-risk, sensitive or business-critical matters requiring senior legal oversight. These matters should not be forced into a generic short SLA. They should be managed through milestone tracking, clear ownership, periodic status updates and escalation where needed.

Data-backed rationale.
The hypothesis test on request type did not show that request type alone explains delay. This means the department should not assume that all contract reviews, regulatory matters or litigation support requests behave the same way. The better approach is to classify matters based on complexity, dependency, approval requirement and urgency.

Recommendation 2: Adopt a First-Time-Right Intake Protocol

The department should introduce a standard intake checklist. Legal timelines should only begin once the requester provides the information required for meaningful legal review.

Code
intake_checklist <- tibble(
  Required_Information = c(
    "Clear description of the request",
    "Relevant documents",
    "Commercial context",
    "Urgency and deadline rationale",
    "Internal approvals already obtained or required",
    "External party contact details",
    "Risk level or business impact"
  ),
  Why_It_Matters = c(
    "Helps Legal understand the business ask",
    "Reduces back-and-forth",
    "Allows Legal to give practical advice",
    "Supports proper prioritisation",
    "Prevents late-stage approval bottlenecks",
    "Speeds up follow-up where third-party input is required",
    "Helps classify the matter as Green, Amber or Red"
  )
)

kable(intake_checklist, caption = "Proposed First-Time-Right Intake Checklist")
Proposed First-Time-Right Intake Checklist
Required_Information Why_It_Matters
Clear description of the request Helps Legal understand the business ask
Relevant documents Reduces back-and-forth
Commercial context Allows Legal to give practical advice
Urgency and deadline rationale Supports proper prioritisation
Internal approvals already obtained or required Prevents late-stage approval bottlenecks
External party contact details Speeds up follow-up where third-party input is required
Risk level or business impact Helps classify the matter as Green, Amber or Red

Data-backed rationale.
Review rounds were the strongest statistical predictor of delay in the logistic regression. This supports the introduction of a first-time-right process. If requesters provide complete information from the beginning, Legal can reduce avoidable rework, unnecessary back-and-forth and late-stage stakeholder alignment.

Recommendation 3: Introduce a Support-and-Supervise Capacity Model

The department should address workload concentration by separating senior legal judgement from supportable legal process work.

The analysis shows that one senior legal resource handled 48.8% of all matters. The hypothesis test does not prove that this workload concentration alone causes delay. However, the concentration is still a business risk, especially where the senior resource is handling complex, sensitive and approval-heavy matters.

In the current business context, the recommendation should not be framed as a request for immediate permanent headcount. The company currently has an employment freeze, and the Legal Department has already exceeded its approved workload count. A permanent resource request may therefore not be feasible in 2026.

The practical 2026 solution is to introduce supervised temporary support through NYSC lawyers or fixed-term contract counsel.

Code
support_supervise <- tibble(
  Supportable_Work = c(
    "Initial document review",
    "First drafts",
    "Document comparison",
    "Version control",
    "Tracker updates",
    "Follow-ups with business teams",
    "First-level research notes"
  ),
  Senior_Lawyer_Retains = c(
    "Final legal position",
    "Strategic judgement",
    "Negotiation strategy",
    "Sensitive stakeholder engagement",
    "Final sign-off",
    "Escalation decisions",
    "Risk-based advice"
  )
)

kable(support_supervise, caption = "Support-and-Supervise Capacity Model")
Support-and-Supervise Capacity Model
Supportable_Work Senior_Lawyer_Retains
Initial document review Final legal position
First drafts Strategic judgement
Document comparison Negotiation strategy
Version control Sensitive stakeholder engagement
Tracker updates Final sign-off
Follow-ups with business teams Escalation decisions
First-level research notes Risk-based advice

This model does not dilute legal quality. It protects it. Senior lawyers continue to own the judgement-heavy work, while support resources handle process-heavy tasks that currently consume senior capacity.

Data-backed rationale.
The workload concentration hypothesis was not statistically significant, so the analysis does not justify saying that workload concentration alone causes delay. However, the EDA shows material concentration of work in one senior resource. When combined with qualitative delay comments around urgent priorities and capacity constraints, this supports a capacity-management intervention.

The recommendation is therefore a temporary capacity bridge for 2026. The department should track the effect of this model during the year. If the data continues to show structural workload pressure, it can support a more permanent resource case during the 2027 annual planning cycle.

10.3 Business Impact of the Recommendations

Code
business_impact <- tibble(
  Recommendation = c(
    "Triage and tiering model",
    "First-time-right intake protocol",
    "Support-and-supervise capacity model"
  ),
  Expected_Business_Impact = c(
    "Improves prioritisation and ensures complex matters are managed differently from routine requests",
    "Reduces avoidable review cycles, incomplete submissions and back-and-forth",
    "Increases processing capacity without breaching the 2026 employment freeze or approved workload count"
  )
)

kable(business_impact, caption = "Expected Business Impact")
Expected Business Impact
Recommendation Expected_Business_Impact
Triage and tiering model Improves prioritisation and ensures complex matters are managed differently from routine requests
First-time-right intake protocol Reduces avoidable review cycles, incomplete submissions and back-and-forth
Support-and-supervise capacity model Increases processing capacity without breaching the 2026 employment freeze or approved workload count

Together, these recommendations move the department from a reactive request-handling model to a more disciplined legal operations model. The aim is not simply to ask for more resources. The aim is to use data to improve workflow design, protect senior legal capacity, reduce avoidable friction and support faster business execution.

11. Limitations and Further Work

This analysis has important limitations.

First, the dataset is based on legal requests captured from available records and emails. It may not capture every informal request made through calls, WhatsApp messages, corridor conversations or verbal escalations.

Second, some matters were still open or did not have clear completion dates. These records were excluded from turnaround-day analysis, although they remained relevant for delay-status analysis.

Third, the delay variable is partly judgement-based. “Appeared delayed” was determined from the available records, but future analysis would benefit from a formally agreed SLA baseline for each request type.

Fourth, the dataset does not yet include a direct matter complexity score. Complexity had to be inferred through proxies such as review rounds, external input and management approval.

Fifth, responsible lawyer should not be interpreted as individual performance scoring. Some lawyers may handle more complex, sensitive or senior-level work, which naturally affects turnaround.

Future analysis should add a formal complexity score, SLA target, SLA start date, SLA pause periods, requester-caused delay, Legal-caused delay and business impact rating. This would make it easier to separate delay within Legal’s control from delay caused by business stakeholders, management approvals, regulators, vendors or external counsel.

The department should also repeat the analysis after implementing the triage and intake framework to test whether delay rates, review rounds and average turnaround improve.

12. References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making: From data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online.

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048

Nwandem, O. (2026). Anonymised legal requests dataset: January 2025 to April 2026 [Dataset]. Collected from Nigerian Breweries Plc Legal Department, Lagos, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S., Müller, K., Ooms, J., Robinson, D., Seidel, D., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Appendix: AI Usage Statement

I used AI tools to support the structuring of this case study, refine the business interpretation of statistical outputs, and assist with drafting reproducible Quarto code. The analytical decisions, including the choice of dataset, the framing of the business problem, the selection of hypotheses, the interpretation of the results and the final recommendations, were reviewed and adapted by me based on my understanding of the Legal Department’s operations and the practical business context.