ALY 6070 · Communication and Visualization for Data Analytics

Anticipating Equipment Failure: Initial Analysis of Industrial Process Data

An EDA-to-baseline-model study of 10,000 milling-machine runs — combining a data scientist’s search for predictive signal with a business analyst’s view of cost, risk, and action.

Sections

📊 Dashboard

Executive summary

Across 10,000 runs, only 3.91% fail, yet failure is highly structured. Power and Overstrain modes drive most cost; the Low-quality variant fails most (4.49%); and risk concentrates at both torque extremes and once tool wear exceeds ~200 min. A leakage-controlled logistic baseline already separates failures well (ROC-AUC 0.89, PR-AUC 0.46), and a single explainable rule catches 70% of failures — evidence that a predictive-maintenance program is both feasible and worthwhile.

🧭 How to read this dashboard: start with the KPI cards, then read the charts in order — what fails (mode), who fails (variant), when (torque & tool-wear thresholds), and what it costs. Color is consistent across every tab.

High risk / Failure Moderate risk Low risk / Safe band Healthy run Process variable

10,000
Production Runs
3.91%
Failure Rate
391
Total Failures
$399K
Est. Downtime Cost*
40
Avg Torque (Nm)
114
Avg Tool Wear (min)
0.89
Model ROC-AUC
24.6:1
Healthy : Failed
tibble(mode=names(mc), n=as.integer(mc)) %>% mutate(lab=mode_full[mode]) %>% arrange(n) %>%
  mutate(lab=factor(lab, levels=lab)) %>%
  ggplot(aes(n, lab, fill=mode)) + geom_col() +
  geom_text(aes(label=n), hjust=-0.25, size=3.4, color=ink) +
  scale_fill_manual(values=mode_cols, guide="none") +
  scale_x_continuous(expand=expansion(mult=c(0,.15))) +
  labs(title="Failures by Mode", x="Number of Failures", y=NULL)
Power & Heat-Dissipation modes account for most stoppages.

Power & Heat-Dissipation modes account for most stoppages.

tibble(type=names(ft), r=as.numeric(ft)) %>%
  mutate(type=factor(type, levels=c("Low","Medium","High")),
         col=c(Low=red, Medium=green, High=amber)[as.character(type)]) %>%
  ggplot(aes(type, r, fill=col)) + geom_col() +
  geom_text(aes(label=sprintf('%.2f%%', r)), vjust=-0.4, size=3.4, color=ink) +
  scale_fill_identity() + scale_y_continuous(labels=function(x)paste0(x,"%"), expand=expansion(mult=c(0,.13))) +
  labs(title="Failure Rate by Product Quality", x="Product Variant", y="Failure Rate")
The Low-quality variant fails ~1.5x more often than High.

The Low-quality variant fails ~1.5x more often than High.

df %>% mutate(b=cut(torque_nm,c(0,20,30,40,50,60,90),labels=c("<20","20-30","30-40","40-50","50-60","60+"))) %>%
  group_by(b) %>% summarise(r=mean(machine_failure)*100,.groups="drop") %>%
  ggplot(aes(b, r, fill=riskcol(r))) + geom_col() +
  geom_text(aes(label=sprintf('%.0f%%', r)), vjust=-0.4, size=3.2, color=ink) +
  scale_fill_identity() + scale_y_continuous(labels=function(x)paste0(x,"%"), expand=expansion(mult=c(0,.13))) +
  labs(title="Failures Cluster at Low & High Torque", x="Torque [Nm]", y="Failure Rate")
Risk is U-shaped: lowest mid-range, highest at both torque extremes.

Risk is U-shaped: lowest mid-range, highest at both torque extremes.

df %>% mutate(b=cut(tool_wear_min,c(0,50,100,150,200,260),labels=c("0-50","50-100","100-150","150-200","200+"))) %>%
  group_by(b) %>% summarise(r=mean(machine_failure)*100,.groups="drop") %>%
  ggplot(aes(b, r, fill=riskcol(r))) + geom_col() +
  geom_text(aes(label=sprintf('%.0f%%', r)), vjust=-0.4, size=3.2, color=ink) +
  scale_fill_identity() + scale_y_continuous(labels=function(x)paste0(x,"%"), expand=expansion(mult=c(0,.13))) +
  labs(title="Failure Rate vs Tool Wear", x="Tool Wear [min]", y="Failure Rate")
A clear threshold — risk jumps ~5x once tool wear passes 200 min.

A clear threshold — risk jumps ~5x once tool wear passes 200 min.

tibble(mode=names(cost_v), cost=as.numeric(cost_v)) %>% mutate(lab=mode_full[mode]) %>% arrange(cost) %>%
  mutate(lab=factor(lab, levels=lab)) %>%
  ggplot(aes(cost, lab, fill=mode)) + geom_col() +
  geom_text(aes(label=paste0('$', round(cost/1000),'K')), hjust=-0.2, size=3.2, color=ink) +
  scale_fill_manual(values=mode_cols, guide="none") +
  scale_x_continuous(labels=dollar, expand=expansion(mult=c(0,.18))) +
  labs(title="Estimated Downtime Cost by Mode*", x="Estimated Cost (USD)", y=NULL)
Power & Overstrain dominate cost; Overstrain rivals Power despite fewer events because each is more severe.

Power & Overstrain dominate cost; Overstrain rivals Power despite fewer events because each is more severe.

The business case

At an assumed blended rate of $250/hr of unplanned downtime and mode-specific repair times, the 391 failures represent roughly $399,000 in avoidable cost (~$1,020 per event). The explainable early-warning rule (see Modeling) would catch about 70% of failures — on the order of $277,565 in addressable cost.

*Cost figures are illustrative assumptions for prioritization, not measured financials.

🔬 Exploratory Analysis

Hypotheses

We enter the analysis with four testable expectations:
  • H1 — Higher tool wear is associated with higher failure (expected threshold near end-of-life).
  • H2 — Torque relates to failure non-linearly (both extremes risky via the power band).
  • H3 — Product variant is associated with failure rate (Low > High).
  • H4 — Engineered power and temperature-difference features carry most of the predictive signal.

The variables — and why they matter

Q: What are the variables, and why are they important?
tribble(
 ~Variable, ~Type, ~Units, ~`Why it matters`,
 "udi, product_id","Identifier","—","Run / part traceability",
 "type (Low/Med/High)","Categorical","—","Quality variant; sets failure thresholds",
 "air_temperature_k","Numeric","K","Ambient input to heat dissipation",
 "process_temperature_k","Numeric","K","Machining heat; drives HDF with air temp",
 "rotational_speed_rpm","Numeric","rpm","Speed; with torque sets power",
 "torque_nm","Numeric","Nm","Load; drives power & overstrain",
 "tool_wear_min","Numeric","min","Cumulative wear; drives TWF & OSF",
 "machine_failure","Binary (target)","0/1","Did the run fail? (headline KPI)",
 "twf/hdf/pwf/osf/rnf","Binary","0/1","Five specific failure modes (the 'why')"
) %>% kable()
Variable Type Units Why it matters
udi, product_id Identifier Run / part traceability
type (Low/Med/High) Categorical Quality variant; sets failure thresholds
air_temperature_k Numeric K Ambient input to heat dissipation
process_temperature_k Numeric K Machining heat; drives HDF with air temp
rotational_speed_rpm Numeric rpm Speed; with torque sets power
torque_nm Numeric Nm Load; drives power & overstrain
tool_wear_min Numeric min Cumulative wear; drives TWF & OSF
machine_failure Binary (target) 0/1 Did the run fail? (headline KPI)
twf/hdf/pwf/osf/rnf Binary 0/1 Five specific failure modes (the ‘why’)

Data quality, descriptive statistics & class balance

df %>% select(all_of(numv)) %>% pivot_longer(everything(), names_to="Variable") %>%
  group_by(Variable) %>%
  summarise(n=n(), Mean=mean(value), SD=sd(value), Min=min(value),
            Median=median(value), Max=max(value), .groups="drop") %>%
  mutate(across(c(Mean,SD,Min,Median,Max), ~round(.,2))) %>% kable(caption="Descriptive statistics (numeric variables)")
Descriptive statistics (numeric variables)
Variable n Mean SD Min Median Max
air_temperature_k 10000 300.00 2.00 296.40 300.6 303.90
power_w 10000 6286.47 1074.45 775.92 6387.2 10665.67
process_temperature_k 10000 310.20 1.49 305.80 310.3 314.40
rotational_speed_rpm 10000 1539.66 178.28 1168.00 1537.0 2196.00
temp_diff 10000 10.20 1.01 6.80 10.2 13.70
tool_wear_min 10000 114.31 52.31 0.00 109.0 250.00
torque_nm 10000 40.00 10.08 3.50 40.0 87.20
📌 Completeness: 10,000 rows, no missing values. Class imbalance: failures are only 3.91% of runs — a 24.6:1 healthy-to-failed ratio, so plain accuracy is misleading and we report recall/precision/PR-AUC and tune the decision threshold. Sanity check: the five mode flags sum to 411 but there are only 391 failures — the 19-row gap is expected because one run can trip multiple modes. The flags are leakage for prediction and are dropped before modeling.

Outlier scan (IQR rule)

map_dfr(numv, function(v){x<-df[[v]];q<-quantile(x,c(.25,.75));iqr<-q[2]-q[1]
  lo<-q[1]-1.5*iqr;hi<-q[2]+1.5*iqr
  tibble(Variable=v, Outliers=sum(x<lo|x>hi), `% of rows`=round(mean(x<lo|x>hi)*100,2))}) %>%
  kable(caption="Mild outliers flagged by the 1.5xIQR rule")
Mild outliers flagged by the 1.5xIQR rule
Variable Outliers % of rows
air_temperature_k 0 0.00
process_temperature_k 0 0.00
rotational_speed_rpm 37 0.37
torque_nm 67 0.67
tool_wear_min 0 0.00
power_w 161 1.61
temp_diff 31 0.31

A handful of outliers appear in torque and speed (the operating tails) — these are informative, not errors, since the tails are exactly where failures occur. They are retained.

Feature engineering

Two engineered features encode the physics of the documented failure rules and, as the model later confirms, carry most of the signal:

  • power_w = torque x speed (rad/s) — linearizes the power-failure mechanism (a “safe band” of 3,500–9,000 W).
  • temp_diff = process − air temperature — captures the heat-dissipation condition directly.

Binned versions of torque and tool wear (used in the dashboard) make the non-linear thresholds legible to a business audience.

Distributions of the process measurements

df %>% select(all_of(numv)) %>% pivot_longer(everything()) %>%
  ggplot(aes(value)) + geom_histogram(bins=35, fill=navy, color="white", linewidth=0.2) +
  facet_wrap(~name, scales="free", ncol=3) + labs(title="Distributions of Numeric Variables", x=NULL, y="Count")
Temperatures are tightly controlled, torque is ~Normal(40,10), speed is right-skewed, tool wear spans 0-250 min.

Temperatures are tightly controlled, torque is ~Normal(40,10), speed is right-skewed, tool wear spans 0-250 min.

Statistical test — variant vs failure

The chi-square test of independence between product variant and failure gives χ² = 13.2, p = 0.0014, so the association in H3 is statistically significant — failure rate genuinely differs by variant.

Which modes matter most — a Pareto view

pc <- tibble(mode=names(mc), n=as.integer(mc)) %>% arrange(desc(n)) %>%
  mutate(mode=factor(toupper(mode), levels=toupper(mode)), cum=cumsum(n)/sum(n)*100)
ggplot(pc, aes(mode, n)) +
  geom_col(aes(fill=as.character(mode))) +
  geom_line(aes(y=cum/100*max(n), group=1), color=red, linewidth=1) +
  geom_point(aes(y=cum/100*max(n)), color=red) +
  geom_hline(yintercept=0.8*max(pc$n), linetype="dashed", color=slate) +
  scale_fill_manual(values=mode_cols, guide="none") +
  scale_y_continuous(sec.axis=sec_axis(~./max(pc$n)*100, name="Cumulative %")) +
  labs(title="Pareto of Failure Modes", x=NULL, y="Failures")
About 80% of failures come from two-to-three modes — the classic Pareto signal of where to focus.

About 80% of failures come from two-to-three modes — the classic Pareto signal of where to focus.

The risk surface: torque x tool wear

df %>% mutate(tb=cut(torque_nm,c(0,25,35,45,55,90),labels=c("<25","25-35","35-45","45-55","55+")),
              wb=cut(tool_wear_min,c(0,60,120,180,260),labels=c("0-60","60-120","120-180","180+"))) %>%
  group_by(tb,wb) %>% summarise(r=mean(machine_failure)*100,.groups="drop") %>%
  ggplot(aes(wb, tb, fill=r)) + geom_tile(color="white", linewidth=0.6) +
  geom_text(aes(label=sprintf('%.1f', r)), size=3, color=ink) +
  scale_fill_gradient(low="#fff7ec", high=red, name="Fail %") +
  labs(title="Failure-Rate Risk Surface: Torque x Tool Wear", x="Tool wear [min]", y="Torque [Nm]") +
  theme(legend.position="right")
Risk is not additive — it concentrates in the corners: low torque (any wear) and high torque WITH high wear.

Risk is not additive — it concentrates in the corners: low torque (any wear) and high torque WITH high wear.

The power “safe band” mechanism

ggplot(df, aes(power_w, fill=factor(machine_failure))) +
  annotate("rect", xmin=3500, xmax=9000, ymin=0, ymax=Inf, fill=green, alpha=0.10) +
  geom_histogram(bins=60, position="identity", alpha=0.6, color=NA) +
  geom_vline(xintercept=c(3500,9000), linetype="dashed", color=green) +
  scale_fill_manual(values=c(`0`=teal,`1`=red), labels=c("No failure","Failure"), name=NULL) +
  labs(title="Power Output vs Failure (safe band shaded)", x="Power [W]", y="Count")
Failures (red) pile up OUTSIDE the 3,500-9,000 W band while healthy runs sit inside — a direct view of the power-failure rule.

Failures (red) pile up OUTSIDE the 3,500-9,000 W band while healthy runs sit inside — a direct view of the power-failure rule.

Torque–speed signature & driver differences

ggplot(df, aes(torque_nm, rotational_speed_rpm)) +
  geom_point(data=filter(df,machine_failure==0), color="grey60", alpha=0.16, size=0.8) +
  geom_point(data=filter(df,machine_failure==1), color=red, alpha=0.85, size=1.6) +
  labs(title="Torque vs Rotational Speed (failures highlighted)", x="Torque [Nm]", y="Rotational speed [rpm]")
Failures trace the EDGES of the operating cloud — where the torque-speed balance leaves the safe power band.

Failures trace the EDGES of the operating cloud — where the torque-speed balance leaves the safe power band.

df %>% select(machine_failure, torque_nm, tool_wear_min, rotational_speed_rpm) %>%
  pivot_longer(-machine_failure) %>%
  ggplot(aes(factor(machine_failure), value, fill=factor(machine_failure))) +
  geom_boxplot(outlier.size=0.5) + facet_wrap(~name, scales="free_y") +
  scale_fill_manual(values=c(teal,red), guide="none") +
  scale_x_discrete(labels=c("OK","Fail")) +
  labs(title="Key Drivers: Failed vs Healthy Runs", x=NULL, y=NULL)
Failed runs carry higher tool wear and a wider torque spread.

Failed runs carry higher tool wear and a wider torque spread.

Failure modes by product variant

df %>% group_by(type) %>% summarise(across(c(twf,hdf,pwf,osf,rnf), sum)) %>%
  pivot_longer(-type, names_to="Mode", values_to="Count") %>% mutate(Mode=toupper(Mode)) %>%
  ggplot(aes(type, Count, fill=Mode)) + geom_col() +
  scale_fill_manual(values=mode_cols) +
  labs(title="Failure Modes by Product Variant", x="Product Variant", y="Failures")
The Low-quality variant contributes the most failures — and the highest rate, not just volume.

The Low-quality variant contributes the most failures — and the highest rate, not just volume.

Interactive data preview

prev <- df %>% select(udi, type, all_of(numv), machine_failure) %>% head(60)
if (requireNamespace("DT", quietly=TRUE)) {
  DT::datatable(prev, rownames=FALSE, options=list(pageLength=8, scrollX=TRUE))
} else { kable(head(prev,8), caption="First rows (install 'DT' for an interactive table)") }

🤖 Modeling & Risk

A data-scientist lens: can we predict failure before it happens — and turn that into an action?

Method

We fit a logistic-regression baseline on a stratified 70/30 train/test split, using only the process variables (the five mode flags are dropped to prevent leakage). Numeric features are standardized, and because of the strong 24.6:1 class imbalance we judge the model with ROC-AUC, PR-AUC, recall and precision rather than accuracy, and tune the decision threshold for a high-recall operating point.

Performance

at05 <- ev(p,y,0.5); at80 <- ev(p,y,thr80)
tibble(
  `Operating point` = c("Default (0.50)", sprintf("High-recall (%.2f)", thr80)),
  Precision = round(c(at05["Precision"], at80["Precision"]),3),
  Recall    = round(c(at05["Recall"],    at80["Recall"]),3),
  F1        = round(c(at05["F1"],         at80["F1"]),3),
  Accuracy  = round(c(at05["Accuracy"],   at80["Accuracy"]),3)
) %>% kable(caption=sprintf("Test-set performance · ROC-AUC = %.3f · PR-AUC = %.3f (baseline prevalence %.3f)",
                            AUC, PRAUC, mean(y)))
Test-set performance · ROC-AUC = 0.891 · PR-AUC = 0.465 (baseline prevalence 0.039)
Operating point Precision Recall F1 Accuracy
Default (0.50) 0.816 0.263 0.397 0.969
High-recall (0.02) 0.118 0.890 0.208 0.733
o<-order(p,decreasing=TRUE); l<-y[o]
roc<-tibble(fpr=c(0,cumsum(1-l)/sum(1-l)), tpr=c(0,cumsum(l)/sum(l)), curve="ROC")
prc<-tibble(rec=cumsum(l)/sum(l), prec=cumsum(l)/seq_along(l))
g1<-ggplot(roc,aes(fpr,tpr))+geom_abline(linetype="dashed",color=slate)+
  geom_line(color=teal,linewidth=1)+coord_equal()+
  labs(title=sprintf("ROC curve (AUC=%.3f)",AUC), x="False positive rate", y="True positive rate")
g2<-ggplot(prc,aes(rec,prec))+geom_hline(yintercept=mean(y),linetype="dashed",color=slate)+
  geom_line(color=amber,linewidth=1)+ylim(0,1)+
  labs(title=sprintf("Precision-Recall (AP=%.3f)",PRAUC), x="Recall", y="Precision")
if (requireNamespace("patchwork", quietly=TRUE)) { library(patchwork); g1 + g2 } else { print(g1); print(g2) }
ROC and Precision-Recall curves on the held-out test set. PR-AUC is the honest metric under imbalance.

ROC and Precision-Recall curves on the held-out test set. PR-AUC is the honest metric under imbalance.

Feature importance

co <- summary(fit)$coefficients
data.frame(term=rownames(co), z=abs(co[,"z value"]))[-1,] %>%
  arrange(z) %>% mutate(term=factor(term, levels=term)) %>%
  ggplot(aes(z, term)) + geom_col(fill=navy) +
  labs(title="Feature Importance (|standardized z|)", x="|z value|", y=NULL)
Standardized logistic coefficients (|z|). Engineered power & temperature-difference and tool wear carry the most signal — confirming H4.

Standardized logistic coefficients (|z|). Engineered power & temperature-difference and tool wear carry the most signal — confirming H4.

⚠️ Caveat: power_w and temp_diff are derived from the raw inputs, so some multicollinearity is expected; individual coefficients should be read as a ranking guide, not precise effects. A tree-based model (next step) would handle interactions and the U-shape natively.

From prediction to action — an explainable early-warning rule

Models are only useful if they drive decisions. A single transparent rule — flag a run if tool wear ≥ 200 min, or power leaves the 3,500–9,000 W band, or torque < 20 Nm — already captures most failures:

tibble(Metric=c("Runs flagged","Failures caught (recall)","Precision of flag","Est. addressable cost"),
       Value=c(sprintf("%.1f%%", rule_flagged), sprintf("%.0f%%", rule_recall),
               sprintf("%.0f%%", rule_prec), paste0("$", fmt(rule_saved)))) %>%
  kable(caption="Explainable early-warning rule (no model scoring required on the floor)")
Explainable early-warning rule (no model scoring required on the floor)
Metric Value
Runs flagged 9.0%
Failures caught (recall) 70%
Precision of flag 30%
Est. addressable cost $277,565

This rule needs no live model scoring, is auditable by engineers, and is the natural MVP before deploying a calibrated model that tunes the recall/precision trade-off to the plant’s tolerance for false alarms.

Limitations & next steps

AI4I 2020 is a synthetic dataset, so relationships are cleaner and more rule-like than real shop-floor data; findings are a well-behaved baseline to validate against live equipment. Variables are described as associated with failure — causal within this data, but real causation needs controlled validation.
  • Imbalance (~25:1): add class weights / SMOTE and keep recall-focused evaluation.
  • Non-linearity: the linear logit underuses the U-shaped torque effect — fit tree/boosted models next.
  • No time dimension: remaining-useful-life and drift monitoring are out of scope here.
  • Next: (1) build the interactive Tableau dashboard; (2) calibrated tree model + threshold policy; (3) deploy the early-warning rule as an MVP and measure realized savings.

Conclusion — revisiting the hypotheses

All four hypotheses hold: tool wear shows a threshold effect (H1), torque acts through a U-shape (H2), variant is a significant factor (H3, χ² p < 0.001), and the engineered power/temp-difference features dominate importance (H4). Failure here is systematic and predictable — the basis for the upcoming Tableau dashboard and a predictive-maintenance program.

📚 References & Data

Dataset access

AI4I 2020 Predictive Maintenance Dataset — a synthetic dataset reflecting real predictive-maintenance data from an industrial milling process (10,000 runs, 14 features, five failure modes).

References (APA 7th edition)

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Few, S. (2006). Information dashboard design: The effective visual communication of data. O’Reilly Media.

He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239

Kelly, M., Longjohn, R., & Nottingham, K. (2023). The UCI Machine Learning Repository. University of California, Irvine. https://archive.ics.uci.edu

Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. Wiley.

Matzka, S. (2020). Explainable artificial intelligence for predictive maintenance applications. In 2020 Third International Conference on Artificial Intelligence for Industries (AI4I) (pp. 69–74). IEEE. https://doi.org/10.1109/AI4I49448.2020.00023

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686