ALY 6070 · Communication and Visualization for Data Analytics
An EDA-to-baseline-model study of 10,000 milling-machine runs — combining a data scientist’s search for predictive signal with a business analyst’s view of cost, risk, and action.
Across 10,000 runs, only 3.91% fail, yet failure is highly structured. Power and Overstrain modes drive most cost; the Low-quality variant fails most (4.49%); and risk concentrates at both torque extremes and once tool wear exceeds ~200 min. A leakage-controlled logistic baseline already separates failures well (ROC-AUC 0.89, PR-AUC 0.46), and a single explainable rule catches 70% of failures — evidence that a predictive-maintenance program is both feasible and worthwhile.
High risk / Failure Moderate risk Low risk / Safe band Healthy run Process variable
tibble(mode=names(mc), n=as.integer(mc)) %>% mutate(lab=mode_full[mode]) %>% arrange(n) %>%
mutate(lab=factor(lab, levels=lab)) %>%
ggplot(aes(n, lab, fill=mode)) + geom_col() +
geom_text(aes(label=n), hjust=-0.25, size=3.4, color=ink) +
scale_fill_manual(values=mode_cols, guide="none") +
scale_x_continuous(expand=expansion(mult=c(0,.15))) +
labs(title="Failures by Mode", x="Number of Failures", y=NULL)Power & Heat-Dissipation modes account for most stoppages.
tibble(type=names(ft), r=as.numeric(ft)) %>%
mutate(type=factor(type, levels=c("Low","Medium","High")),
col=c(Low=red, Medium=green, High=amber)[as.character(type)]) %>%
ggplot(aes(type, r, fill=col)) + geom_col() +
geom_text(aes(label=sprintf('%.2f%%', r)), vjust=-0.4, size=3.4, color=ink) +
scale_fill_identity() + scale_y_continuous(labels=function(x)paste0(x,"%"), expand=expansion(mult=c(0,.13))) +
labs(title="Failure Rate by Product Quality", x="Product Variant", y="Failure Rate")The Low-quality variant fails ~1.5x more often than High.
df %>% mutate(b=cut(torque_nm,c(0,20,30,40,50,60,90),labels=c("<20","20-30","30-40","40-50","50-60","60+"))) %>%
group_by(b) %>% summarise(r=mean(machine_failure)*100,.groups="drop") %>%
ggplot(aes(b, r, fill=riskcol(r))) + geom_col() +
geom_text(aes(label=sprintf('%.0f%%', r)), vjust=-0.4, size=3.2, color=ink) +
scale_fill_identity() + scale_y_continuous(labels=function(x)paste0(x,"%"), expand=expansion(mult=c(0,.13))) +
labs(title="Failures Cluster at Low & High Torque", x="Torque [Nm]", y="Failure Rate")Risk is U-shaped: lowest mid-range, highest at both torque extremes.
df %>% mutate(b=cut(tool_wear_min,c(0,50,100,150,200,260),labels=c("0-50","50-100","100-150","150-200","200+"))) %>%
group_by(b) %>% summarise(r=mean(machine_failure)*100,.groups="drop") %>%
ggplot(aes(b, r, fill=riskcol(r))) + geom_col() +
geom_text(aes(label=sprintf('%.0f%%', r)), vjust=-0.4, size=3.2, color=ink) +
scale_fill_identity() + scale_y_continuous(labels=function(x)paste0(x,"%"), expand=expansion(mult=c(0,.13))) +
labs(title="Failure Rate vs Tool Wear", x="Tool Wear [min]", y="Failure Rate")A clear threshold — risk jumps ~5x once tool wear passes 200 min.
tibble(mode=names(cost_v), cost=as.numeric(cost_v)) %>% mutate(lab=mode_full[mode]) %>% arrange(cost) %>%
mutate(lab=factor(lab, levels=lab)) %>%
ggplot(aes(cost, lab, fill=mode)) + geom_col() +
geom_text(aes(label=paste0('$', round(cost/1000),'K')), hjust=-0.2, size=3.2, color=ink) +
scale_fill_manual(values=mode_cols, guide="none") +
scale_x_continuous(labels=dollar, expand=expansion(mult=c(0,.18))) +
labs(title="Estimated Downtime Cost by Mode*", x="Estimated Cost (USD)", y=NULL)Power & Overstrain dominate cost; Overstrain rivals Power despite fewer events because each is more severe.
At an assumed blended rate of $250/hr of unplanned downtime and mode-specific repair times, the 391 failures represent roughly $399,000 in avoidable cost (~$1,020 per event). The explainable early-warning rule (see Modeling) would catch about 70% of failures — on the order of $277,565 in addressable cost.
*Cost figures are illustrative assumptions for prioritization, not measured financials.
tribble(
~Variable, ~Type, ~Units, ~`Why it matters`,
"udi, product_id","Identifier","—","Run / part traceability",
"type (Low/Med/High)","Categorical","—","Quality variant; sets failure thresholds",
"air_temperature_k","Numeric","K","Ambient input to heat dissipation",
"process_temperature_k","Numeric","K","Machining heat; drives HDF with air temp",
"rotational_speed_rpm","Numeric","rpm","Speed; with torque sets power",
"torque_nm","Numeric","Nm","Load; drives power & overstrain",
"tool_wear_min","Numeric","min","Cumulative wear; drives TWF & OSF",
"machine_failure","Binary (target)","0/1","Did the run fail? (headline KPI)",
"twf/hdf/pwf/osf/rnf","Binary","0/1","Five specific failure modes (the 'why')"
) %>% kable()| Variable | Type | Units | Why it matters |
|---|---|---|---|
| udi, product_id | Identifier | — | Run / part traceability |
| type (Low/Med/High) | Categorical | — | Quality variant; sets failure thresholds |
| air_temperature_k | Numeric | K | Ambient input to heat dissipation |
| process_temperature_k | Numeric | K | Machining heat; drives HDF with air temp |
| rotational_speed_rpm | Numeric | rpm | Speed; with torque sets power |
| torque_nm | Numeric | Nm | Load; drives power & overstrain |
| tool_wear_min | Numeric | min | Cumulative wear; drives TWF & OSF |
| machine_failure | Binary (target) | 0/1 | Did the run fail? (headline KPI) |
| twf/hdf/pwf/osf/rnf | Binary | 0/1 | Five specific failure modes (the ‘why’) |
df %>% select(all_of(numv)) %>% pivot_longer(everything(), names_to="Variable") %>%
group_by(Variable) %>%
summarise(n=n(), Mean=mean(value), SD=sd(value), Min=min(value),
Median=median(value), Max=max(value), .groups="drop") %>%
mutate(across(c(Mean,SD,Min,Median,Max), ~round(.,2))) %>% kable(caption="Descriptive statistics (numeric variables)")| Variable | n | Mean | SD | Min | Median | Max |
|---|---|---|---|---|---|---|
| air_temperature_k | 10000 | 300.00 | 2.00 | 296.40 | 300.6 | 303.90 |
| power_w | 10000 | 6286.47 | 1074.45 | 775.92 | 6387.2 | 10665.67 |
| process_temperature_k | 10000 | 310.20 | 1.49 | 305.80 | 310.3 | 314.40 |
| rotational_speed_rpm | 10000 | 1539.66 | 178.28 | 1168.00 | 1537.0 | 2196.00 |
| temp_diff | 10000 | 10.20 | 1.01 | 6.80 | 10.2 | 13.70 |
| tool_wear_min | 10000 | 114.31 | 52.31 | 0.00 | 109.0 | 250.00 |
| torque_nm | 10000 | 40.00 | 10.08 | 3.50 | 40.0 | 87.20 |
map_dfr(numv, function(v){x<-df[[v]];q<-quantile(x,c(.25,.75));iqr<-q[2]-q[1]
lo<-q[1]-1.5*iqr;hi<-q[2]+1.5*iqr
tibble(Variable=v, Outliers=sum(x<lo|x>hi), `% of rows`=round(mean(x<lo|x>hi)*100,2))}) %>%
kable(caption="Mild outliers flagged by the 1.5xIQR rule")| Variable | Outliers | % of rows |
|---|---|---|
| air_temperature_k | 0 | 0.00 |
| process_temperature_k | 0 | 0.00 |
| rotational_speed_rpm | 37 | 0.37 |
| torque_nm | 67 | 0.67 |
| tool_wear_min | 0 | 0.00 |
| power_w | 161 | 1.61 |
| temp_diff | 31 | 0.31 |
A handful of outliers appear in torque and speed (the operating tails) — these are informative, not errors, since the tails are exactly where failures occur. They are retained.
Two engineered features encode the physics of the documented failure rules and, as the model later confirms, carry most of the signal:
power_w = torque x speed (rad/s) —
linearizes the power-failure mechanism (a “safe band” of 3,500–9,000
W).temp_diff = process − air temperature
— captures the heat-dissipation condition directly.Binned versions of torque and tool wear (used in the dashboard) make the non-linear thresholds legible to a business audience.
df %>% select(all_of(numv)) %>% pivot_longer(everything()) %>%
ggplot(aes(value)) + geom_histogram(bins=35, fill=navy, color="white", linewidth=0.2) +
facet_wrap(~name, scales="free", ncol=3) + labs(title="Distributions of Numeric Variables", x=NULL, y="Count")Temperatures are tightly controlled, torque is ~Normal(40,10), speed is right-skewed, tool wear spans 0-250 min.
corrplot(cor(df %>% select(all_of(numv), machine_failure)), method="color", type="upper",
addCoef.col="black", number.cex=0.65, tl.cex=0.72, tl.col="black",
col=colorRampPalette(c(red,"white",navy))(200), mar=c(0,0,1,0))Torque and speed are near mirror images; failure has little LINEAR tie to any single variable.
power_w to linearize it.
The chi-square test of independence between product variant and failure gives χ² = 13.2, p = 0.0014, so the association in H3 is statistically significant — failure rate genuinely differs by variant.
pc <- tibble(mode=names(mc), n=as.integer(mc)) %>% arrange(desc(n)) %>%
mutate(mode=factor(toupper(mode), levels=toupper(mode)), cum=cumsum(n)/sum(n)*100)
ggplot(pc, aes(mode, n)) +
geom_col(aes(fill=as.character(mode))) +
geom_line(aes(y=cum/100*max(n), group=1), color=red, linewidth=1) +
geom_point(aes(y=cum/100*max(n)), color=red) +
geom_hline(yintercept=0.8*max(pc$n), linetype="dashed", color=slate) +
scale_fill_manual(values=mode_cols, guide="none") +
scale_y_continuous(sec.axis=sec_axis(~./max(pc$n)*100, name="Cumulative %")) +
labs(title="Pareto of Failure Modes", x=NULL, y="Failures")About 80% of failures come from two-to-three modes — the classic Pareto signal of where to focus.
df %>% mutate(tb=cut(torque_nm,c(0,25,35,45,55,90),labels=c("<25","25-35","35-45","45-55","55+")),
wb=cut(tool_wear_min,c(0,60,120,180,260),labels=c("0-60","60-120","120-180","180+"))) %>%
group_by(tb,wb) %>% summarise(r=mean(machine_failure)*100,.groups="drop") %>%
ggplot(aes(wb, tb, fill=r)) + geom_tile(color="white", linewidth=0.6) +
geom_text(aes(label=sprintf('%.1f', r)), size=3, color=ink) +
scale_fill_gradient(low="#fff7ec", high=red, name="Fail %") +
labs(title="Failure-Rate Risk Surface: Torque x Tool Wear", x="Tool wear [min]", y="Torque [Nm]") +
theme(legend.position="right")Risk is not additive — it concentrates in the corners: low torque (any wear) and high torque WITH high wear.
ggplot(df, aes(power_w, fill=factor(machine_failure))) +
annotate("rect", xmin=3500, xmax=9000, ymin=0, ymax=Inf, fill=green, alpha=0.10) +
geom_histogram(bins=60, position="identity", alpha=0.6, color=NA) +
geom_vline(xintercept=c(3500,9000), linetype="dashed", color=green) +
scale_fill_manual(values=c(`0`=teal,`1`=red), labels=c("No failure","Failure"), name=NULL) +
labs(title="Power Output vs Failure (safe band shaded)", x="Power [W]", y="Count")Failures (red) pile up OUTSIDE the 3,500-9,000 W band while healthy runs sit inside — a direct view of the power-failure rule.
ggplot(df, aes(torque_nm, rotational_speed_rpm)) +
geom_point(data=filter(df,machine_failure==0), color="grey60", alpha=0.16, size=0.8) +
geom_point(data=filter(df,machine_failure==1), color=red, alpha=0.85, size=1.6) +
labs(title="Torque vs Rotational Speed (failures highlighted)", x="Torque [Nm]", y="Rotational speed [rpm]")Failures trace the EDGES of the operating cloud — where the torque-speed balance leaves the safe power band.
df %>% select(machine_failure, torque_nm, tool_wear_min, rotational_speed_rpm) %>%
pivot_longer(-machine_failure) %>%
ggplot(aes(factor(machine_failure), value, fill=factor(machine_failure))) +
geom_boxplot(outlier.size=0.5) + facet_wrap(~name, scales="free_y") +
scale_fill_manual(values=c(teal,red), guide="none") +
scale_x_discrete(labels=c("OK","Fail")) +
labs(title="Key Drivers: Failed vs Healthy Runs", x=NULL, y=NULL)Failed runs carry higher tool wear and a wider torque spread.
df %>% group_by(type) %>% summarise(across(c(twf,hdf,pwf,osf,rnf), sum)) %>%
pivot_longer(-type, names_to="Mode", values_to="Count") %>% mutate(Mode=toupper(Mode)) %>%
ggplot(aes(type, Count, fill=Mode)) + geom_col() +
scale_fill_manual(values=mode_cols) +
labs(title="Failure Modes by Product Variant", x="Product Variant", y="Failures")The Low-quality variant contributes the most failures — and the highest rate, not just volume.
We fit a logistic-regression baseline on a stratified 70/30 train/test split, using only the process variables (the five mode flags are dropped to prevent leakage). Numeric features are standardized, and because of the strong 24.6:1 class imbalance we judge the model with ROC-AUC, PR-AUC, recall and precision rather than accuracy, and tune the decision threshold for a high-recall operating point.
at05 <- ev(p,y,0.5); at80 <- ev(p,y,thr80)
tibble(
`Operating point` = c("Default (0.50)", sprintf("High-recall (%.2f)", thr80)),
Precision = round(c(at05["Precision"], at80["Precision"]),3),
Recall = round(c(at05["Recall"], at80["Recall"]),3),
F1 = round(c(at05["F1"], at80["F1"]),3),
Accuracy = round(c(at05["Accuracy"], at80["Accuracy"]),3)
) %>% kable(caption=sprintf("Test-set performance · ROC-AUC = %.3f · PR-AUC = %.3f (baseline prevalence %.3f)",
AUC, PRAUC, mean(y)))| Operating point | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|
| Default (0.50) | 0.816 | 0.263 | 0.397 | 0.969 |
| High-recall (0.02) | 0.118 | 0.890 | 0.208 | 0.733 |
o<-order(p,decreasing=TRUE); l<-y[o]
roc<-tibble(fpr=c(0,cumsum(1-l)/sum(1-l)), tpr=c(0,cumsum(l)/sum(l)), curve="ROC")
prc<-tibble(rec=cumsum(l)/sum(l), prec=cumsum(l)/seq_along(l))
g1<-ggplot(roc,aes(fpr,tpr))+geom_abline(linetype="dashed",color=slate)+
geom_line(color=teal,linewidth=1)+coord_equal()+
labs(title=sprintf("ROC curve (AUC=%.3f)",AUC), x="False positive rate", y="True positive rate")
g2<-ggplot(prc,aes(rec,prec))+geom_hline(yintercept=mean(y),linetype="dashed",color=slate)+
geom_line(color=amber,linewidth=1)+ylim(0,1)+
labs(title=sprintf("Precision-Recall (AP=%.3f)",PRAUC), x="Recall", y="Precision")
if (requireNamespace("patchwork", quietly=TRUE)) { library(patchwork); g1 + g2 } else { print(g1); print(g2) }ROC and Precision-Recall curves on the held-out test set. PR-AUC is the honest metric under imbalance.
co <- summary(fit)$coefficients
data.frame(term=rownames(co), z=abs(co[,"z value"]))[-1,] %>%
arrange(z) %>% mutate(term=factor(term, levels=term)) %>%
ggplot(aes(z, term)) + geom_col(fill=navy) +
labs(title="Feature Importance (|standardized z|)", x="|z value|", y=NULL)Standardized logistic coefficients (|z|). Engineered power & temperature-difference and tool wear carry the most signal — confirming H4.
power_w and temp_diff are
derived from the raw inputs, so some multicollinearity is
expected; individual coefficients should be read as a ranking guide, not
precise effects. A tree-based model (next step) would handle
interactions and the U-shape natively.
Models are only useful if they drive decisions. A single transparent rule — flag a run if tool wear ≥ 200 min, or power leaves the 3,500–9,000 W band, or torque < 20 Nm — already captures most failures:
tibble(Metric=c("Runs flagged","Failures caught (recall)","Precision of flag","Est. addressable cost"),
Value=c(sprintf("%.1f%%", rule_flagged), sprintf("%.0f%%", rule_recall),
sprintf("%.0f%%", rule_prec), paste0("$", fmt(rule_saved)))) %>%
kable(caption="Explainable early-warning rule (no model scoring required on the floor)")| Metric | Value |
|---|---|
| Runs flagged | 9.0% |
| Failures caught (recall) | 70% |
| Precision of flag | 30% |
| Est. addressable cost | $277,565 |
This rule needs no live model scoring, is auditable by engineers, and is the natural MVP before deploying a calibrated model that tunes the recall/precision trade-off to the plant’s tolerance for false alarms.
AI4I 2020 Predictive Maintenance Dataset — a synthetic dataset reflecting real predictive-maintenance data from an industrial milling process (10,000 runs, 14 features, five failure modes).
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Few, S. (2006). Information dashboard design: The effective visual communication of data. O’Reilly Media.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
Kelly, M., Longjohn, R., & Nottingham, K. (2023). The UCI Machine Learning Repository. University of California, Irvine. https://archive.ics.uci.edu
Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. Wiley.
Matzka, S. (2020). Explainable artificial intelligence for predictive maintenance applications. In 2020 Third International Conference on Artificial Intelligence for Industries (AI4I) (pp. 69–74). IEEE. https://doi.org/10.1109/AI4I49448.2020.00023
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686