2026-04-11

Dataset & Problem

Industrial Machines frequently breakdown and cause loss of money to companies due to unexpected down time. Ideally machinery would be monitored and recognized as nearing failure and preventative maintenance could be performed prior to total failure. This presentation analyzes and attempts to predict failure on a set of sensor data from industrial machinery using linear regression.In this analysis we attempt to fit a model to known good baseline data for oil pressure vs power consumption to predict anomalous sensor data, assisting in detection of machine failure.In this analysis we will focus on one machine type as each machine type will have different baseline an anomaly characteristics. The data set was obtained from a Kaggle data set focused on identifying machine failure [https://www.kaggle.com/datasets/sdeogade/sparse-industrial-machine-time-series-dataset?resource=download].

Defining the Model

\[Y = \beta_0 +\beta_1X + \epsilon\] \(Y\): Predicted Power Consumption (kW)
\(X\): Oil Pressure (bar)
\(\beta_0\): Baseline power
\(\beta_1\): Slope of oil pressure vs power
\(\epsilon\): Noise factor

Dataset Loading, Filtering and Plotting

machine_data <- read_csv("./industrial_machine_data.csv")

lathe_baseline <- machine_data %>%
  # This filters to a single lathe and only uses healthy data.
  filter(asset_tag == "AST-1041", breakdown_flag==0) 

ggplot(lathe_baseline, aes(x = oil_pressure_bar, y = power_consumption_kw)) +
  geom_point(alpha = 0.3, color = "#2c3e50", size=2) +
  geom_smooth(method = "lm", color = "#e74c3c", linewidth = 1.2, se = TRUE) + # Adds regression line
  labs(
    title = " Baseline Oil Pressure vs Power Curve (CNC Lathe)",
    x = "Oil Pressure (bar)",
    y = "Power Consumption (kW)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

Baseline Lathe Data

Baseline vs Anaomaly

Analysis:
Our original prediction of possibly identifying anomalys data using the Oil Pressure vs Power Consumption proved to not be a great anomaly detection model. You can see that the anomalous data is mixed with our baseline “Healthy” data. This indicates the model could not perform an accurate prediction of failure. The ideal state for this plot would have been for the anomalous points to be significantly separated from our healthy points. This is a key finding as it means we need to shift directions to look for a better correlation between health and failure.

New Hypothesis

A machines temperature is related to its speed by the equation \[T_{bearing} = T_{ambient} + \left( \frac{\tau \cdot \pi}{30 \cdot h \cdot A} \right) \text{RPM}\] \(T_{bearing}\): Temperature of bearing
\(T_{ambient}\): Ambient room temperature of factory
\(\tau\): Friction coefficient
\(h \cdot A\): Cooling coefficient

This can also be simplified using a linear regression \[Temp = \beta_0 + \beta1(RPM) + \epsilon\] \(\beta_0:T_{ambient}\) (Y intercept)
\(\beta_1\): Represents the combined friction and cooling coefficients (Slope)

RPM vs Temperature Analysis

Summary

Our original hypothesis did not turn out to be correct and so we attempted to find another variable that would help us correlate failure. On the previous slide you can see that while we were able to clearly separate anomalous vs baseline data using the RPM vs Temperature we can clearly see there is not a relationship to temperature and the relationship is based on RPM. In this case all of the failures seem to be happening in a certain RPM band which is not necessarily indicative of failure at these points but that the machine most likely fails in its high RPM range. This analysis highlights that simple regression has its limitations and use cases but also highlights the key importance of performing this exploration over your data and testing hypotheses to find these correlations. While the simple linear regression did not prove effective on this data set we were able to determine that to predict machine failure more advanced techniques are needed.