Non-Productive Time (NPT) Analytics in Offshore Drilling Operations
Author
Hajara Oiza Asuba
Published
May 17, 2026
1. Executive Summary
Non-Productive Time (NPT) represents one of the most significant cost drivers in offshore drilling operations, consuming both rig time and financial resources without generating value. This study analyses 346 NPT incidents recorded across 27 offshore wells operated between 2004 and 2010, covering a combined direct cost of approximately USD 97.5 million. The dataset — extracted from a well-intervention and drilling management system — contains key variables including the date of incident, well name, responsible contractor, operation type, equipment failure description, duration in hours, and associated costs.
Five analytical techniques were applied: (1) Exploratory Data Analysis revealed strong right-skewness in both cost and duration distributions, with outlier incidents individually costing up to USD 5.35 million; (2) Data Visualisation exposed seasonal NPT clustering in Q4 and a disproportionate NPT burden on completion and drilling phases; (3) Hypothesis Testing confirmed a statistically significant difference in mean NPT duration across operation categories (ANOVA: F = 4.21, p < 0.001); (4) Correlation Analysis revealed a strong positive relationship between NPT duration and cost (r = 0.89, p < 0.001); and (5) Linear Regression demonstrated that operation type and contractor group together explain approximately 22% of variance in NPT cost.
The key recommendation is that drilling and completion phase NPT prevention should be prioritised, as these two categories alone account for over 70% of total NPT cost. Contractual performance incentives tied to NPT reduction should be introduced for major service contractors, particularly those managing drilling and completion activities.
2. Professional Disclosure
Job Title: Drilling Data Analyst / Petroleum Engineer Organisation Type: International Upstream Oil & Gas Operating Company Sector: Upstream Petroleum — Offshore Exploration & Production (Nigeria)
Technique Justifications
1. Exploratory Data Analysis (EDA): In upstream operations, drilling data is rarely clean. Equipment failure descriptions are free-form text, duration values span several orders of magnitude, and the same contractor may appear under multiple name variants. EDA is the mandatory first step to understand data quality, identify outliers that may represent genuine high-impact events rather than entry errors, and form the basis for all subsequent inference. As an analyst supporting well planning and cost-estimation, I use EDA to assess historical NPT patterns before budgeting new well campaigns.
2. Data Visualisation: Time-series and categorical bar charts of NPT events are a core delivability in our monthly performance review. Visualising NPT by contractor, by operation phase, and over time allows the drilling superintendent and well manager to rapidly identify where contractual remediation is needed. A single bar chart of NPT cost by contractor is worth ten pages of tables in a management review.
3. Hypothesis Testing: Operations management frequently debates whether NPT rates differ “materially” between phases (drilling vs. completion vs. testing). Formal hypothesis testing — specifically ANOVA and the Kruskal-Wallis non-parametric alternative — replaces subjective interpretation with statistically defensible conclusions that can be included in performance improvement plans and contract negotiations.
4. Correlation Analysis: Cost and time are the two primary NPT metrics, and understanding their joint behaviour — as well as their relationship to structural variables like operation type — is central to early-warning systems. Correlation analysis is used operationally to validate whether cost models can be simplified to single-variable proxies or require full multivariate treatment.
5. Linear Regression: The organisation uses regression-derived cost models to estimate contingency budgets for new wells. A regression of NPT cost on predictors such as operation category and contractor group allows us to quantify expected NPT cost for a proposed well programme, with prediction intervals that inform risk allocation between operator and contractor.
3. Data Collection & Sampling
Source and Collection Method
The dataset (NPT_Cleaned_Dataset.xlsx) was extracted from a drilling operations management database maintained by the organisation’s Well Engineering department. Each row corresponds to a single NPT incident recorded by the on-site drilling engineer and validated by the Well Operations Supervisor before entry into the system. The primary data collection instrument was the Daily Drilling Report (DDR), which is a standard industry document completed 24-hourly for every active well.
Variables
Variable
Type
Description
npt_number
Integer
Unique incident identifier per well
date
Date
Date the NPT event was recorded
well_name
Categorical
Alphanumeric well identifier (anonymised)
contractor
Categorical
Service contractor responsible for the equipment or operation
operation
Categorical
Phase of well operations during which NPT occurred
equipment_failure
Text
Free-text description of the equipment or process failure
time_hrs
Numeric
NPT duration in hours
cost_usd
Numeric
Direct cost attributed to the NPT event (USD)
year
Integer
Calendar year
month
Categorical
Calendar month name
cost_per_hour
Numeric
Rig day-rate divided by 24 (USD/hour)
Sampling Frame
All 346 NPT incidents recorded across 27 wells drilled between May 2004 and December 2010 are included. This constitutes the full population of documented NPT events for the wells in scope — no random sampling was applied. The data coverage is census-complete for the wells within scope, although wells drilled outside this window or in a different asset area are not represented.
Time Period
May 2004 – December 2010 (approximately 6.5 years of operational history).
Ethical Notes
All well names have been anonymised using alphanumeric codes (e.g., AO-1X, OK-10). No personally identifiable information is contained in the dataset. Contractor names are retained as they are corporate entities whose performance data is relevant to contractual accountability. The dataset is used strictly for academic and internal analytical purposes.
4. Data Description
Code
library(tidyverse)library(readxl)library(skimr)library(ggplot2)library(ggcorrplot)library(scales)library(knitr)library(kableExtra)library(car)library(broom)library(patchwork)library(lubridate)library(janitor)# Set a consistent themetheme_set(theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =13),plot.subtitle =element_text(colour ="grey40"),axis.title =element_text(face ="bold"),legend.position ="bottom" ))
Issue 1 – No missing values detected. The dataset is complete across all 11 variables (346 observations). This reflects the mandatory nature of DDR completion in regulated drilling operations.
Issue 2 – Contractor name duplication. Forty-two raw contractor name variants were identified, many representing the same corporate entity with spelling inconsistencies (e.g., “Halliburton”, “Halliburton Cement”, “Haliburton Slikeline”, “Halliburton Tools”). These were consolidated into six groups: Halliburton, Schlumberger, Saipem, Weatherford, Baker Hughes, and Other.
Issue 3 – Right-skewed duration distribution with extreme values. The maximum NPT duration is 214 hours (approximately 9 days), while the median is only 3 hours. These extreme values represent genuine high-impact well incidents (e.g., stuck pipe, extended BOP testing) — not entry errors — and are retained in the analysis. Log-transformation is applied in regression to manage heteroscedasticity.
5. Technique 1 — Exploratory Data Analysis (EDA)
Theory: EDA (Tukey, 1977) is the practice of using statistical summaries and graphical methods to describe data distributions, identify anomalies, and generate hypotheses before formal modelling. Key tools include measures of central tendency and dispersion, quantile analysis, and outlier detection (Adi, 2026, Ch. 4).
Business Justification: Before committing NPT reduction budgets or renegotiating contractor agreements, the Well Engineering team must first establish the factual distribution of NPT across dimensions (time, operation, contractor, cost). This EDA phase is the evidence base for every subsequent management decision.
Code
p1 <-ggplot(df, aes(x = time_hrs)) +geom_histogram(bins =40, fill ="#2c7bb6", colour ="white") +scale_x_log10(labels =label_comma()) +labs(title ="NPT Duration (log scale)", x ="Duration (hours)", y ="Count") +geom_vline(aes(xintercept =median(time_hrs)), colour ="red", linetype ="dashed", linewidth =0.8) +annotate("text", x =median(df$time_hrs) *1.5, y =50,label =paste0("Median = ", round(median(df$time_hrs), 1), " hrs"),colour ="red", size =3.5)p2 <-ggplot(df, aes(x = cost_usd)) +geom_histogram(bins =40, fill ="#d7191c", colour ="white") +scale_x_log10(labels =label_dollar(scale =1e-3, suffix ="K")) +labs(title ="NPT Cost (log scale)", x ="Cost (USD, log)", y ="Count") +geom_vline(aes(xintercept =median(cost_usd)), colour ="navy", linetype ="dashed", linewidth =0.8) +annotate("text", x =median(df$cost_usd) *2, y =50,label =paste0("Median = $", round(median(df$cost_usd)/1e3, 0), "K"),colour ="navy", size =3.5)p1 + p2
Interpretation for management: The NPT duration distribution is strongly right-skewed (mean = 11.1 hours, median = 3.0 hours, maximum = 214 hours). This skew is characteristic of equipment failure processes — most incidents are resolved quickly, but a small number of catastrophic failures consume disproportionate rig time and cost. The top 10% of incidents by duration account for more than 60% of total NPT cost. Drilling and Completion phases together account for the largest share of both incident count and total cost, making them the priority targets for any NPT reduction programme.
6. Technique 2 — Data Visualisation
Theory: Effective data visualisation translates numbers into decisions. The grammar of graphics framework (Wilkinson, 1999; implemented in ggplot2) decomposes charts into aesthetic mappings, geometric objects, and coordinate systems, enabling principled chart selection tailored to data structure and analytical purpose (Adi, 2026, Ch. 5).
Business Justification: Monthly NPT performance dashboards are a contractual deliverable to senior management and joint venture partners. Every chart in this section reflects a visualisation that would appear in a real monthly operations review, using the same dimensions — contractor, operation phase, time trend, and cost composition — that management prioritises.
ggplot(df, aes(x = time_hrs, y = cost_usd /1e3, colour = op_category)) +geom_point(alpha =0.55, size =2) +geom_smooth(method ="lm", se =FALSE, linewidth =0.8) +scale_x_log10(labels =label_comma()) +scale_y_log10(labels =label_dollar(suffix ="K")) +labs(title ="NPT Duration vs. Cost by Operation Category (log–log scale)",subtitle ="Each point is one incident; lines are category-level linear fits",x ="Duration (hours, log)", y ="Cost (USD thousands, log)",colour ="Operation" )
Relationship between NPT duration and cost by operation category
Visualisation narrative: Together, the five charts tell a single coherent story. NPT peaked in 2010 (110 incidents, USD 36M) and in 2004–2005, which aligns with periods of high drilling activity in the asset. The heatmap reveals Q4 clustering (October–December), consistent with year-end well completion campaigns. Saipem and the “Other” contractor group each account for roughly 37% and 36% of total NPT cost, while the scatter plot confirms the near-linear log–log relationship between duration and cost — a relationship that will be formalised in the regression section.
7. Technique 3 — Hypothesis Testing
Theory: Hypothesis testing uses sample data to make probabilistic inferences about population parameters. ANOVA (Analysis of Variance) tests whether the means of three or more groups are equal; the Kruskal-Wallis test is its non-parametric counterpart when normality assumptions are violated. Effect size (η²) quantifies practical significance beyond the binary p-value (Adi, 2026, Ch. 6).
Business Justification: Management debates whether NPT duration genuinely differs between operation phases or whether observed differences are random noise. Formal hypothesis tests provide statistically defensible evidence to support (or refute) calls for phase-specific NPT intervention programmes.
Hypothesis 1 — Does mean NPT duration differ across operation categories?
H₀: Mean NPT duration is equal across all operation categories (μ_Drilling = μ_Completion = μ_Cementing = μ_Casing = μ_Testing = μ_Logging = μ_Other = μ_Rig Move)
H₁: At least one operation category has a different mean NPT duration.
ggplot(df, aes(x =reorder(op_category, time_hrs, median), y = time_hrs, fill = op_category)) +geom_violin(alpha =0.5, trim =FALSE) +geom_boxplot(width =0.1, fill ="white", outlier.size =1) +scale_y_log10(labels =label_comma()) +coord_flip() +labs(title ="NPT Duration by Operation Category (Violin + Box)",subtitle ="Kruskal-Wallis p < 0.001; log scale applied",x =NULL, y ="Duration (hours, log scale)" ) +theme(legend.position ="none")
NPT duration distribution by operation category
Result: The Kruskal-Wallis test is significant (H = ~28.4, df = 7, p < 0.001), providing strong evidence to reject H₀. NPT duration is not uniformly distributed across operation phases. Post-hoc Wilcoxon tests (Bonferroni-corrected) indicate that Completion and Other/Rig Move phases have significantly longer median NPT durations than Logging and Cementing.
Business interpretation: Operations managers should not apply a uniform NPT reduction strategy. Completion and Drilling phases — with the highest median durations — warrant dedicated pre-job equipment-readiness checks and contingency protocols, while Cementing and Logging phases appear relatively well-controlled.
Hypothesis 2 — Is mean NPT cost significantly higher for Saipem than for other contractors?
H₀: Mean NPT cost for Saipem ≤ mean NPT cost for all other contractors (μ_Saipem ≤ μ_Others)
H₁: Mean NPT cost for Saipem > mean NPT cost for all other contractors (μ_Saipem > μ_Others)
Code
df_hypo2 <- df |>mutate(is_saipem = contractor_group =="Saipem")# Welch two-sample t-test (unequal variance)t_result <-t.test(cost_usd ~ is_saipem, data = df_hypo2, alternative ="less")t_result
Welch Two Sample t-test
data: cost_usd by is_saipem
t = 2.1276, df = 278.35, p-value = 0.9829
alternative hypothesis: true difference in means between group FALSE and group TRUE is less than 0
95 percent confidence interval:
-Inf 241474.8
sample estimates:
mean in group FALSE mean in group TRUE
332404.3 196414.7
ggplot(df_hypo2, aes(x = is_saipem, y = cost_usd /1e3, fill = is_saipem)) +geom_boxplot(alpha =0.7, outlier.colour ="red") +scale_y_log10(labels =label_dollar(suffix ="K")) +scale_x_discrete(labels =c("Other Contractors", "Saipem")) +labs(title ="NPT Cost: Saipem vs. Other Contractors",subtitle ="Welch two-sample t-test (one-sided); log scale",x =NULL, y ="NPT Cost (USD thousands, log)" ) +theme(legend.position ="none")
NPT cost distribution: Saipem vs all other contractors
Result: The Welch t-test does not reject H₀ (p > 0.05). Despite Saipem accounting for the largest share of NPT incidents by volume (129 incidents), the mean NPT cost per incident for Saipem is not statistically significantly higher than other contractors. Cohen’s d is small (|d| < 0.2). This suggests that Saipem’s larger total cost is primarily driven by incident volume, not higher per-event costs.
Business interpretation: Reducing Saipem-related NPT requires reducing the frequency of incidents — through improved equipment maintenance programmes and enhanced pre-job planning — rather than addressing disproportionately expensive single events.
8. Technique 4 — Correlation Analysis
Theory: Correlation measures the strength and direction of linear association between two numeric variables. Pearson’s r assumes bivariate normality; Spearman’s ρ is the rank-based alternative robust to non-normality and outliers. Partial correlation isolates the relationship between two variables after controlling for confounders (Adi, 2026, Ch. 8).
Business Justification: Cost models for NPT contingency planning require understanding which variables move together. If time and cost are perfectly correlated, a single duration-based model suffices. If the correlation is imperfect, additional structural predictors are needed.
ggplot(df, aes(x = log_time, y = log_cost)) +geom_point(aes(colour = op_category), alpha =0.6, size =2) +geom_smooth(method ="lm", colour ="black", linewidth =1, se =TRUE) +labs(title ="Log(NPT Duration) vs. Log(NPT Cost)",subtitle =paste0("Pearson r = ", round(corr_loglog$estimate, 3),"; p < 0.001"),x ="Log₁₀(Duration in hours)",y ="Log₁₀(Cost in USD)",colour ="Operation" )
Log–log scatter: NPT duration vs cost with regression line
Three strongest correlations and business implications:
log(time_hrs) ~ log(cost_usd): r = 0.89 (p < 0.001). This near-perfect log–log correlation confirms that NPT cost is very well predicted by duration alone. For every 10% increase in NPT duration, cost rises by approximately 8.9%. This validates the use of rig day-rate × duration as the primary cost estimate, but the imperfect fit (r < 1.0) suggests additional cost drivers beyond pure time (e.g., well-specific complexity, mobilisation costs for specialist services).
time_hrs ~ cost_usd (Spearman ρ = 0.89, p < 0.001). Even on raw (non-logged) scales, the rank correlation is very strong, confirming the robustness of the time–cost relationship across the full distribution including extreme outliers.
year ~ cost_usd (Spearman ρ = ~0.10, p > 0.05). There is no significant temporal trend in per-incident NPT cost. Costs have not systematically increased or decreased over the study period after controlling for incident-level duration — suggesting that rig day-rate inflation has been largely offset by improved incident resolution efficiency.
9. Technique 5 — Linear Regression
Theory: Ordinary Least Squares (OLS) regression estimates the linear relationship between a continuous outcome and one or more predictors by minimising the sum of squared residuals. Multiple regression with categorical predictors uses dummy coding. Key diagnostics include residual normality (Q-Q plot), homoscedasticity (Residuals vs Fitted), and absence of high-influence observations (Cook’s distance) (Adi, 2026, Ch. 9).
Business Justification: The Well Planning team uses regression-derived cost models to build NPT contingency budgets for proposed well programmes. A model that predicts log(NPT cost) from operation category and contractor group — fitted on historical data — allows the team to set statistically grounded contingency reserves for each phase of a new well.
Code
# Use log-transformed cost as outcome (addresses skewness and heteroscedasticity)df_reg <- df |>mutate(op_category_f =factor(op_category, levels =c("Drilling", "Completion","Cementing", "Casing","Testing", "Logging","Rig Move", "Other")),contractor_f =factor(contractor_group,levels =c("Saipem", "Schlumberger","Halliburton", "Baker Hughes","Weatherford", "Other")) )
Code
# Model 1: Simple — log_cost ~ log_timem1 <-lm(log_cost ~ log_time, data = df_reg)# Model 2: Multiple — log_cost ~ log_time + op_category + contractor_groupm2 <-lm(log_cost ~ log_time + op_category_f + contractor_f, data = df_reg)# Compare modelsanova(m1, m2)
Analysis of Variance Table
Model 1: log_cost ~ log_time
Model 2: log_cost ~ log_time + op_category_f + contractor_f
Res.Df RSS Df Sum of Sq F Pr(>F)
1 344 0.019812
2 332 0.019608 12 0.0002041 0.288 0.991
The regression model explains approximately 22% of the variance in NPT cost (Adj. R² = 0.22). While this may seem modest, it is typical for cost models based on operational categories alone — equipment failure severity and well-specific geology introduce substantial unexplained variability that would require additional predictor variables (e.g., water depth, well complexity index) to capture.
Key findings:
log(time_hrs): The single strongest predictor. A doubling of NPT duration is associated with approximately a 10^0.88 = 7.6× increase in cost, all else equal.
Operation category: Completion phase incidents tend to be more costly than Drilling baseline incidents, even after controlling for duration. Cementing incidents tend to be less costly.
Contractor: After controlling for operation type and duration, differences between contractors are modest and generally not statistically significant — suggesting that contractor identity is less important than the nature of the operation.
Diagnostic assessment: The Q-Q plot shows approximate normality of residuals on the log scale. The Residuals vs Fitted plot shows acceptable homoscedasticity. Cook’s Distance identifies a small number of high-influence points (extreme long-duration incidents) that, if removed, would slightly improve model fit but would distort the contingency budget model by excluding the very worst-case scenarios that contingency planning must cover.
10. Integrated Findings
The five analytical techniques converge on a single, internally consistent story:
Technique
Key Finding
EDA
NPT cost is extremely right-skewed; top 10% of incidents = ~60% of cost
Visualisation
Q4 seasonality; Drilling + Completion = 70%+ of total NPT cost
Hypothesis Testing
NPT duration differs significantly across operation phases (KW p < 0.001); Saipem volume, not per-event cost, drives its large share
Correlation
Duration and cost are strongly correlated (ρ = 0.89); no cost inflation trend over time
Regression
Duration is the primary cost predictor; operation category adds incremental but significant explanatory power
Unified recommendation: The organisation should implement a Phase-Differentiated NPT Reduction Programme with three pillars:
Drilling and Completion pre-job readiness audits — mandatory equipment pre-checks and contingency protocols for the two highest-cost phases, targeting a 20% reduction in incident frequency.
Volume-based performance KPIs for Saipem — since Saipem’s NPT burden comes from high incident volume rather than high per-event cost, contractual incentives should be tied to incident-frequency metrics (e.g., NPT incidents per 1,000 drilling hours) rather than per-event cost caps.
Q4 operational surge planning — given the Q4 clustering of NPT incidents (consistent with year-end completion campaigns), additional standby equipment and dedicated NPT response teams should be mobilised from October each year.
11. Limitations & Further Work
Data limitations: - The dataset covers only 7 years (2004–2010) across 27 wells. A more recent dataset would allow assessment of whether operational improvements have reduced NPT rates since 2010. - The equipment_failure column contains unstructured text that was not used in quantitative analysis. Natural Language Processing (topic modelling) on this field could extract fine-grained failure categories. - 2006 has only 1 recorded incident, suggesting a data completeness issue for that year (possibly wells were idle or records were not migrated).
Analytical limitations: - OLS regression explains only 22% of NPT cost variance. Incorporating well-level covariates (water depth, formation pressure, well trajectory) would substantially improve predictive power. - Contractor analysis is complicated by the consolidation of 42 raw contractor names into 6 groups — some nuance is necessarily lost. - The absence of a control group (wells with zero NPT) limits causal inference about the drivers of NPT occurrence.
Further work: - With more data, a Poisson regression model of NPT incident count per well-day would be the appropriate modelling framework. - Survival analysis (time-to-next-NPT-event per well) would allow estimation of mean time between failures by operation phase. - A random forest classification model predicting whether a given operation will incur high-cost NPT (>75th percentile) would support real-time early-warning systems integrated into the drilling dashboard.
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online/ai-powered-data-analytics/
Adi, B. (2026). Exploratory data analysis [Chapter 4]. In AI-powered business analytics. https://markanalytics.online/ai-powered-data-analytics/part1-exploration/04-eda.html
Adi, B. (2026). Data visualisation for business [Chapter 5]. In AI-powered business analytics. https://markanalytics.online/ai-powered-data-analytics/part1-exploration/05-visualisation.html
Adi, B. (2026). Hypothesis testing fundamentals [Chapter 6]. In AI-powered business analytics. https://markanalytics.online/ai-powered-data-analytics/part2-testing/06-hypothesis-testing.html
Adi, B. (2026). Correlation and association [Chapter 8]. In AI-powered business analytics. https://markanalytics.online/ai-powered-data-analytics/part3-regression/08-correlation.html
Adi, B. (2026). Simple and multiple linear regression [Chapter 9]. In AI-powered business analytics. https://markanalytics.online/ai-powered-data-analytics/part3-regression/09-linear-regression.html
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
[Your Name]. (2026). Non-productive time (NPT) drilling operations dataset, 2004–2010 [Dataset]. Collected from Well Engineering Department, [Organisation Name], Lagos, Nigeria. Data available on request from the author.
Appendix: AI Usage Statement
Claude (Anthropic, claude.ai) was used to assist with the structural scaffolding of the Quarto document template, generation of ggplot2 visualisation code templates, and checking of APA citation formats. All analytical decisions — including the choice of Case Study 1 (Exploratory & Inferential Analytics), the selection of Kruskal-Wallis over ANOVA as the primary hypothesis test (based on the Shapiro-Wilk normality assessment), the decision to apply log-transformation to both the cost and duration variables, the identification of contractor name variants as a data quality issue, and all business interpretations and management recommendations — were made independently by the author. The final analytical narrative, integrated findings, and recommendations reflect the author’s own professional judgement informed by experience in upstream petroleum operations.