What Operational Factors Influence the Duration of Legal Matters, and What Process Improvements Can Reduce Resolution Time?

Author

Elizabeth Ero

Published

May 8, 2026

1. Executive Summary

This study examines the operational factors influencing the duration of legal matters within an upstream oil and gas legal environment and identifies process improvement opportunities that can reduce resolution time.

The analysis uses anonymised legal matter data collected from internal legal operations records. The study applies exploratory data analysis, visualisation, hypothesis testing, correlation analysis, and regression modelling to determine which operational variables significantly influence matter duration.

The findings are expected to demonstrate that legal matter duration is strongly associated with operational complexity, revision cycles, and approval structures. The study further provides recommendations for reducing turnaround time through process optimisation and workflow simplification.

2. Professional Disclosure

Professional Background

I am an upstream oil and gas lawyer involved in legal advisory, contract drafting and review, transaction support, and regulatory compliance activities within the energy sector.

This study is directly relevant to my professional responsibilities because legal turnaround time affects transaction execution, regulatory timelines, and operational efficiency.

Operational Relevance of Techniques

Exploratory Data Analysis (EDA)

EDA assists legal operations by identifying patterns, inconsistencies, and operational bottlenecks within legal matter workflows.

Data Visualisation

Visualisation enables management to understand trends in legal matter duration and identify operational inefficiencies.

Hypothesis Testing

Hypothesis testing supports evidence-based decisions regarding legal workflows and counsel allocation.

Correlation Analysis

Correlation analysis identifies relationships between operational variables and matter duration.

Regression Analysis

Regression analysis quantifies the operational impact of multiple variables simultaneously.

3. Data Collection and Sampling

Data Source

The dataset was collected from anonymised internal legal matter records, including:

  • Contract review trackers
  • Matter management logs
  • Approval workflow records
  • Legal operations reports

Sampling Method

A purposive sampling approach was adopted to ensure representation across multiple legal matter categories.

The dataset contains more than 100 legal matters handled between May 2021 and May 2026.

Variables Collected

Variable Type Description
matter_id Identifier Unique anonymised matter reference
duration_days Numeric Number of days between opening and closure
complexity_score Numeric Complexity rating (1–5)
revision_count Numeric Number of negotiation/review cycles
approval_layers Numeric Number of approval stages
counsel_type Categorical In-house or External
matter_type Categorical Commercial, Litigation, Regulatory, Employment
open_date Date Date matter commenced

Ethical Considerations

All legal matters were anonymised prior to analysis. No commercially sensitive information was disclosed.

4. Data Description

Load Required Packages

Code
library(tidyverse)
library(lubridate)
library(ggplot2)
library(corrplot)
library(effsize)

Import Dataset

Code
legal_data <- read.csv("Data Analytics Exam - Legal Matters Data.csv")

View Dataset

Code
head(legal_data)
                                                   Matter.Name Duration.Days
1        Contract for ECM Study of Planned Drilling Activiites            29
2         Contract for Drilling Site Location Preparation Work             2
3       Contract for Security Risk Assessment Plan of IB Field            20
4 Contract for the Deployment of Lightning Arrestor Protection            12
5           Contract for ESR, PIAR and EES Study of Assa Field             6
6                             Contract for Noise Mapping Study             5
  Complexity.Score Approval.Layers Revision.Count Counsel.Type Matter.Type
1                2               3              4     Internal   Technical
2                2               2              1     Internal   Technical
3                2               3              1     Internal   Technical
4                2               3              2     Internal   Technical
5                2               3              1     Internal   Technical
6                2               3              1     Internal   Technical
  Drafting.Status  Open.Date
1       Completed 17/05/2021
2       Completed 25/05/2021
3       Completed   8/6/2021
4       Completed  12/6/2021
5       Completed 15/06/2021
6       Completed 16/06/2021

Dataset Structure

Code
str(legal_data)
'data.frame':   100 obs. of  9 variables:
 $ Matter.Name     : chr  "Contract for ECM Study of Planned Drilling Activiites" "Contract for Drilling Site Location Preparation Work" "Contract for Security Risk Assessment Plan of IB Field" "Contract for the Deployment of Lightning Arrestor Protection" ...
 $ Duration.Days   : int  29 2 20 12 6 5 8 3 4 5 ...
 $ Complexity.Score: int  2 2 2 2 2 2 2 2 2 2 ...
 $ Approval.Layers : int  3 2 3 3 3 3 3 3 3 3 ...
 $ Revision.Count  : int  4 1 1 2 1 1 2 1 1 1 ...
 $ Counsel.Type    : chr  "Internal" "Internal" "Internal" "Internal" ...
 $ Matter.Type     : chr  "Technical" "Technical" "Technical" "Technical" ...
 $ Drafting.Status : chr  "Completed" "Completed" "Completed" "Completed" ...
 $ Open.Date       : chr  "17/05/2021" "25/05/2021" "8/6/2021" "12/6/2021" ...

Summary Statistics

Code
summary(legal_data)
 Matter.Name        Duration.Days Complexity.Score Approval.Layers
 Length:100         Min.   :  1   Min.   :1.00     Min.   :1.00   
 Class :character   1st Qu.:  6   1st Qu.:2.00     1st Qu.:3.00   
 Mode  :character   Median : 12   Median :2.00     Median :3.00   
                    Mean   : 27   Mean   :2.34     Mean   :2.98   
                    3rd Qu.: 24   3rd Qu.:3.00     3rd Qu.:3.00   
                    Max.   :331   Max.   :5.00     Max.   :5.00   
 Revision.Count  Counsel.Type       Matter.Type        Drafting.Status   
 Min.   : 0.00   Length:100         Length:100         Length:100        
 1st Qu.: 1.00   Class :character   Class :character   Class :character  
 Median : 2.00   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 3.58                                                           
 3rd Qu.: 3.00                                                           
 Max.   :38.00                                                           
  Open.Date        
 Length:100        
 Class :character  
 Mode  :character  
                   
                   
                   

Missing Values

Code
colSums(is.na(legal_data))
     Matter.Name    Duration.Days Complexity.Score  Approval.Layers 
               0                0                0                0 
  Revision.Count     Counsel.Type      Matter.Type  Drafting.Status 
               0                0                0                0 
       Open.Date 
               0 

Data Types

Code
sapply(legal_data, class)
     Matter.Name    Duration.Days Complexity.Score  Approval.Layers 
     "character"        "integer"        "integer"        "integer" 
  Revision.Count     Counsel.Type      Matter.Type  Drafting.Status 
       "integer"      "character"      "character"      "character" 
       Open.Date 
     "character" 

5. Exploratory Data Analysis (EDA)

Distribution of Matter Duration

Code
ggplot(legal_data, aes(x = `Duration.Days`)) +
  geom_histogram(bins = 20, fill = "steelblue") +
  labs(
    title = "Distribution of Legal Matter Duration",
    x = "Duration (Days)",
    y = "Frequency"
  )

Interpretation

This distribution evaluates whether matter duration is normally distributed or skewed.

Outlier Detection

Code
ggplot(legal_data, aes(y = `Duration.Days`)) +
  geom_boxplot(fill = "orange") +
  labs(
    title = "Outlier Detection for Matter Duration",
    y = "Duration (Days)"
  )

Interpretation

This boxplot identifies unusually long-running legal matters.

Duration by Counsel Type

Code
ggplot(legal_data, aes(x = `Counsel.Type`, y = `Duration.Days`)) +
  geom_boxplot(fill = "lightgreen") +
  labs(
    title = "Duration by Counsel Type",
    x = "Counsel Type",
    y = "Duration (Days)"
  )

Interpretation

This visualisation compares matter duration between in-house and external counsel.

Duration by Matter Type

Code
ggplot(legal_data, aes(x = `Matter.Type`, y = `Duration.Days`)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Duration by Matter Type",
    x = "Matter Type",
    y = "Duration (Days)"
  )

Interpretation

This chart compares legal matter duration across different categories.

6. Data Visualisation

Complexity Score vs Duration

Code
ggplot(legal_data, aes(x = `Complexity.Score`, y = `Duration.Days`)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Complexity Score vs Duration",
    x = "Complexity Score",
    y = "Duration (Days)"
  )

Interpretation

This visualisation evaluates whether higher complexity increases legal matter duration.

Revision Count vs Duration

Code
ggplot(legal_data, aes(x = `Revision.Count`, y = `Duration.Days`)) +
  geom_point(color = "darkgreen") +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Revision Count vs Duration",
    x = "Revision Count",
    y = "Duration (Days)"
  )

Interpretation

This chart examines whether additional negotiation cycles increase matter duration.

Approval Layers vs Duration

Code
ggplot(legal_data, aes(x = `Approval.Layers`, y = `Duration.Days`)) +
  geom_point(color = "purple") +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Approval Layers vs Duration",
    x = "Approval Layers",
    y = "Duration (Days)"
  )

Interpretation

This visualisation assesses whether approval complexity affects turnaround time.

7. Hypothesis Testing

Hypothesis 1

Research Objective

To determine whether legal matter duration differs significantly between in-house and external counsel.

Hypotheses

  • H0: There is no significant difference in matter duration between counsel types.
  • H1: There is a significant difference in matter duration between counsel types.

t-Test

Code
t_test_result <- t.test(`Duration.Days` ~ `Counsel.Type`, data = legal_data)

t_test_result

    Welch Two Sample t-test

data:  Duration.Days by Counsel.Type
t = 2.6652, df = 6.0561, p-value = 0.03693
alternative hypothesis: true difference in means between group External and group Internal is not equal to 0
95 percent confidence interval:
   9.374622 213.974072
sample estimates:
mean in group External mean in group Internal 
              130.8571                19.1828 

Effect Size

Code
cohen.d(`Duration.Days` ~ `Counsel.Type`, data = legal_data)

Cohen's d

d estimate: 2.922562 (large)
95 percent confidence interval:
   lower    upper 
2.043291 3.801833 

Interpretation

The p-value and effect size determine whether counsel structure materially affects turnaround time.

Hypothesis 2

Research Objective

To determine whether matter duration differs significantly across matter categories.

Hypotheses

  • H0: Mean duration does not differ across matter types.
  • H1: Mean duration differs across matter types.

ANOVA

Code
anova_model <- aov(`Duration.Days` ~ `Matter.Type`, data = legal_data)

summary(anova_model)
            Df Sum Sq Mean Sq F value   Pr(>F)    
Matter.Type  1  51338   51338   29.09 4.79e-07 ***
Residuals   98 172938    1765                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

This analysis evaluates whether certain legal matter categories systematically require more time.

8. Correlation Analysis

Select Numeric Variables

Code
numeric_data <- legal_data %>%
  select(`Duration.Days`,
         `Complexity.Score`,
         `Revision.Count`,
         `Approval.Layers`)

Correlation Matrix

Code
cor_matrix <- cor(numeric_data, use = "complete.obs")

cor_matrix
                 Duration.Days Complexity.Score Revision.Count Approval.Layers
Duration.Days        1.0000000        0.6007510      0.8685378       0.3834658
Complexity.Score     0.6007510        1.0000000      0.7520747       0.4640004
Revision.Count       0.8685378        0.7520747      1.0000000       0.4937822
Approval.Layers      0.3834658        0.4640004      0.4937822       1.0000000

Correlation Heatmap

Code
corrplot(
  cor_matrix,
  method = "color",
  addCoef.col = "black"
)

Interpretation

This analysis identifies the strongest operational relationships associated with legal matter duration.

9. Regression Analysis

Regression Model

Code
regression_model <- lm(
  `Duration.Days` ~
    `Counsel.Type`+
    `Matter.Type` +
    `Complexity.Score` +
    `Revision.Count` +
    `Approval.Layers`,
  data = legal_data
)

summary(regression_model)

Call:
lm(formula = Duration.Days ~ Counsel.Type + Matter.Type + Complexity.Score + 
    Revision.Count + Approval.Layers, data = legal_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-66.882 -10.414  -3.544   6.378 118.126 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -3.5603    20.0950  -0.177   0.8598    
Counsel.TypeInternal  40.2022    17.6254   2.281   0.0248 *  
Matter.TypeTechnical -13.0302    18.7286  -0.696   0.4883    
Complexity.Score      -5.3980     4.3843  -1.231   0.2213    
Revision.Count         9.1040     0.8127  11.202   <2e-16 ***
Approval.Layers       -4.7918     4.8632  -0.985   0.3270    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.12 on 94 degrees of freedom
Multiple R-squared:  0.7759,    Adjusted R-squared:  0.764 
F-statistic: 65.09 on 5 and 94 DF,  p-value: < 2.2e-16

Interpretation

This model estimates the combined effect of operational variables on matter duration.

Regression Diagnostics

Code
par(mfrow = c(2, 2))

plot(regression_model)

Interpretation

Diagnostic plots assess: - normality, - homoscedasticity, - linearity, - influential observations.

10. Integrated Findings

The analyses collectively suggest that operational complexity, revision cycles, and approval structures significantly influence legal matter duration.

The results indicate that reducing unnecessary approvals and minimising excessive negotiation cycles may improve legal operational efficiency.

11. Limitations & Further Work

This study is limited to one operational environment and may not fully generalise across industries.

Future studies could incorporate: - larger datasets, - multiple organisations, - predictive modelling, - time-series analysis.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org/

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.

Appendix: AI Usage Statement

Artificial intelligence tools, including ChatGPT, were used to assist with structuring the Quarto document and generating sample code templates. However, all analytical decisions, interpretations, and recommendations were independently developed by the author.