The 6th Mill is the last mill in any of both tandems, meaning that
the entire bagasse mass flow must go through it (similar to the 1st
mill). When too much imbibition water is applied, chute level control is
lost as the top speed is not enough for displacing the volume
(baggase+water) that must go through the mill. This high chute level
condition increases the probability of blockages and mechanical damage
to the bagasse feeders, that can cause significant downtime.
The purpose of this analysis is to find an operational constraint
over the imbibition water applied to the 6th Mills based upon process
variables to prevent damages and downtime.
Scatter - Correlation Plot Function
mapa_dispersion <- function(label_x,label_y,dataset) {
correlacion <- round(cor(dataset[[label_x]],dataset[[label_y]]),2)
ggplot(dataset, aes(.data[[label_x]], .data[[label_y]])) +
geom_point(
color="orange",
fill="#69b3a2",
shape=21,
alpha=0.5,
size=6,
stroke = 2
) +
geom_smooth(method=lm , color="#990000", fill="#FFCF00", se=TRUE) +
ggtitle(paste(label_y,"vrs.",label_x), subtitle = paste("Correlation: ",correlacion)) +
xlab(label_x) + ylab(label_y)
}
TA & TB Dataframes
The datasets used for this analysis include operational variables
like mill speed, power, torque, chute level, etc., as well as variables
related to mass flows like imbibition water, bagasse, fiber, etc. The
variables are daily averages.
dataset <- read.csv(file = 'Dataset.csv')
# Dataset for each tandem
df_TA <- dataset[-c(4,5,8,9,12,13,16,17,20,21,24,25,31,32,33,34,35,38,39,41,43,45,47,49,51)]
df_TB <- dataset[c(1,4,5,8,9,12,13,16,17,20,21,24,25,31,32,33,34,35,38,39,41,43,45,47,49,51,52,53)]
df_TA
df_TB
Tandem “A”
Density Function: 6th Mill TA Speed (rpm)
We first take a look at the Mill Speed Distribution, to identify the
speed range where the level control of the mill is nearly lost.
# PDF
distr(df_TA,'ST55M601')
Warning: Font 'Arial Narrow' is not installed, has other name, or can't be found

# Boxplot
ggplot(df_TA, aes(y=ST55M601)) +
geom_boxplot(
# custom boxes
color="blue",
fill="blue",
alpha=0.2,
# Notch
notch=TRUE,
notchwidth = 0.8,
# custom outliers
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
scale_x_discrete() +
labs(title="Boxplot",x="", y = "ST55M601 (rpm)")

# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TA$ST55M601))
NA
Maximum Operational Speed: 1200 rpms. Maximum Average Operational
Speed: 1160 rpms.
# Quartiles:
res<-quantile(df_TA$ST55M601, probs = c(0,0.25,0.5,0.75,1))
res
0% 25% 50% 75% 100%
560.6891 868.7622 941.8143 1014.9713 1156.3707
Density Function: 6th Mill TA Power (kW)
We examine the power distribution of the mill, as a proxy variable
for identifiying an acceptable load range.
# PDF
distr(df_TA,'JT55M601')

# Boxplot
ggplot(df_TA, aes(y=JT55M601)) +
geom_boxplot(
# custom boxes
color="blue",
fill="blue",
alpha=0.2,
# Notch
notch=TRUE,
notchwidth = 0.8,
# custom outliers
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
scale_x_discrete() +
labs(title="Boxplot",x="", y = "JT55M601 (kW)")

# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TA$JT55M601))
NA
Maximum Average Operational Power: 769 kW
# Quartiles:
res<-quantile(df_TA$JT55M601, probs = c(0,0.25,0.5,0.75,1))
res
0% 25% 50% 75% 100%
347.0196 489.0803 565.8324 660.6705 769.0661
Density Function: 6th Mill TA Chute Level (%)
We examine the chute level distribution of the mill.
# PDF
distr(df_TA,'LT55M601')

# Boxplot
ggplot(df_TA, aes(y=LT55M601)) +
geom_boxplot(
# custom boxes
color="blue",
fill="blue",
alpha=0.2,
notch=TRUE,
notchwidth = 0.8,
# custom outliers
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
scale_x_discrete() +
labs(title="Boxplot",x="", y = "LT55M601 (%)")

# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TA$LT55M601))
NA
# Quartiles:
res<-quantile(df_TA$LT55M601, probs = c(0,0.25,0.5,0.75,1))
res
0% 25% 50% 75% 100%
13.59563 21.51660 26.01151 31.23077 46.33880
Dataframe Filtering
We filter observations for the following operational ranges:
- Power Range for Normal Sugar Cane Milling.
- Speed Range for High Chute Level Condition in the Mill.
# Filter Dataset by Column Values:
df_TA_filtered <- df_TA[df_TA$JT55M601>=489,] # Filter by Minimum Threshold of 1st Quartile Power
df_TA_filtered <- df_TA_filtered[df_TA_filtered$ST55M601>=1015,] # Filter by Top 25% of Maximum Operational Speed (1050)
df_TA_filtered
NA
Largest Correlation Variables with TA Imbibition Water
We inspect the largest correlated variables with Imbibition Water
corr_var(df_TA_filtered, # dataframe name
ImbTA, # target
max_pvalue = 0.05, # significance level
top = 15, # top n most correlated variables with target
plot = T
)

Scatter Plots for TA Imbibition Water
Day of Season Day of Season might be a confounder
variable, because usually it is highly correlated with bagasse / fiber
content of the sugar cane. We inspect this correlation.
label_x <- "Dia.Zafra"
label_y <- "fibra.caña.ta"
mapa_dispersion(label_x,label_y,df_TA_filtered)

6th Mill TA Speed
label_x <- "ST55M601"
label_y <- "ImbTA"
mapa_dispersion(label_x,label_y,df_TA_filtered)

Sugar Cane Bagasse Content TA
label_x <- "bagazo...caña.ta"
label_y <- "ImbTA"
mapa_dispersion(label_x,label_y,df_TA_filtered)

Fiber Content Tandem A
label_x <- "X..Fibra.Core.TA"
label_y <- "ImbTA"
mapa_dispersion(label_x,label_y,df_TA_filtered)

Bagasse Mass Flow TA
label_x <- "WT555801"
label_y <- "ImbTA"
mapa_dispersion(label_x,label_y,df_TA_filtered)

Sugar Cane Trash Content Tandem A
label_x <- "X..Trash.Ponderado"
label_y <- "ImbTA"
mapa_dispersion(label_x,label_y,df_TA_filtered)

Torque on Mill 1 TA Day of Season might be a
confounder variable, because usually it is highly correlated with
bagasse / fiber content of the sugar cane. We inspect this
correlation.
label_x <- "TQ55M101"
label_y <- "ImbTA"
mapa_dispersion(label_x,label_y,df_TA_filtered)

Imbibition Water TA Linear Model
lm_ImbTA = lm(ImbTA ~ TQ55M101 + WT555801 + X..Fibra.Core.TA + ST55M601 + bagazo...caña.ta + Trash.Total...., data = df_TA_filtered) #Create the linear regression
summary(lm_ImbTA)
Call:
lm(formula = ImbTA ~ TQ55M101 + WT555801 + X..Fibra.Core.TA +
ST55M601 + bagazo...caña.ta + Trash.Total...., data = df_TA_filtered)
Residuals:
Min 1Q Median 3Q Max -60.110 -24.433 2.742 23.406 47.965
Coefficients:
Estimate Std. Error Pr(>|t|)
(Intercept) -577.0672 221.8002 -2.602 0.014873 * 0.1227 0.0618 1.986 0.057233 .
WT555801 1.9592 0.7820 2.505 0.018569 *
X..Fibra.Core.TA -42.0349 17.0986 -2.458 0.020658
ST55M601 0.8991 0.2364 3.803 0.000744 ***
bagazo...caña.ta 18.4594 3.7877 4.874 4.28e-05 *** -10.1793 4.3757 -2.326 0.027753 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 31.98 on 27 degrees of freedom
Multiple R-squared: 0.8399, Adjusted R-squared: 0.8043
F-statistic: 23.61 on 6 and 27 DF, p-value: 1.476e-09
Imbibition Water TA Linear Model Test
# applying fitted values to TA data frame
df_TA_filtered$fitted<- lm_ImbTA$fitted.values
# creating ggplot object for visualization
lm_ImbTA_plot <- ggplot(df_TA_filtered, aes(x= Dia.Zafra, y= ImbTA,colour="real values")) +
geom_point() +
geom_line(aes(y= fitted, colour="fitted values")) +
scale_color_manual(name = "Imbibition Water TA", values = c("fitted values" = "darkblue", "real values" = "red"))
print(lm_ImbTA_plot)

Tandem “B”
Density Function: 6th Mill TB Speed (rpm)
We first take a look at the Mill Speed Distribution, to identify the
speed range where the level control of the mill is nearly lost.
# PDF
distr(df_TB,'ST55N601')

# Boxplot
ggplot(df_TB, aes(y=ST55N601)) +
geom_boxplot(
# custom boxes
color="blue",
fill="blue",
alpha=0.2,
# Notch
notch=TRUE,
notchwidth = 0.8,
# custom outliers
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
scale_x_discrete() +
labs(title="Boxplot",x="", y = "ST55N601 (rpm)")

# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TB$ST55N601))
NA
Maximum Operational Speed: 1200 rpms. Maximum Average Operational
Speed: 1160 rpms.
# Quartiles:
res<-quantile(df_TB$ST55N601, probs = c(0,0.25,0.5,0.75,1))
res
0% 25% 75% 100%
593.0956 897.9540 965.2726 1012.9059 1165.1792
Density Function: 6th Mill TB Power (kW)
We examine the power distribution of the mill, as a proxy variable
for identifiying an acceptable load range.
# PDF
distr(df_TB,'JT55N601')

# Boxplot
ggplot(df_TB, aes(y=JT55N601)) +
geom_boxplot(
# custom boxes
color="blue",
fill="blue",
alpha=0.2,
# Notch
notch=TRUE,
notchwidth = 0.8,
# custom outliers
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
scale_x_discrete() +
labs(title="Boxplot",x="", y = "JT55N601 (kW)")

# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TB$JT55N601))
NA
Maximum Operational Power: 900 kW
# Quartiles:
res<-quantile(df_TB$JT55N601, probs = c(0,0.25,0.5,0.75,1))
res
0% 50% 75% 100%
301.3789 542.7360 642.1705 722.1480 892.5287
Density Function: 6th Mill TB Chute Level (%)
We examine the chute level distribution of the mill.
# PDF
distr(df_TB,'LT55N601')

# Boxplot
ggplot(df_TB, aes(y=LT55N601)) +
geom_boxplot(
# custom boxes
color="blue",
fill="blue",
alpha=0.2,
# Notch
notch=TRUE,
notchwidth = 0.8,
# custom outliers
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
scale_x_discrete() +
labs(title="Boxplot",x="", y = "LT55N601 (%)")

# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TB$LT55N601))
NA
# Quartiles:
res<-quantile(df_TB$LT55N601, probs = c(0,0.25,0.5,0.75,1))
res
0% 25% 50% 75% 100%
2.596137 24.073490 39.651763 48.249095 66.099806
# Filter Dataset by Column Values:
df_TB_filtered <- df_TB[df_TB$JT55N601>=542,] # Filter by Minimum Threshold of 1st Quartile Power
df_TB_filtered <- df_TB_filtered[df_TB_filtered$ST55N601>=1012,] # Filter by Top 25% of Maximum Operational Speed
df_TB_filtered
NA
Largest Correlation Variables with TB Imbibition Water
corr_var(df_TB_filtered, # dataframe name
ImbTB, # target
max_pvalue = 0.05, # significance level
top = 15, # top n most correlated variables with target
plot = T
)

Day of Season
label_x <- "Dia.Zafra"
label_y <- "ImbTB"
mapa_dispersion(label_x,label_y,df_TB_filtered)

6th TB Mill Speed
label_x <- "ST55N601"
label_y <- "ImbTB"
mapa_dispersion(label_x,label_y,df_TB_filtered)

1st TB Mill Power
label_x <- "JT55N101"
label_y <- "ImbTB"
mapa_dispersion(label_x,label_y,df_TB_filtered)

Sugar Cane Fiber Content Tandem B
label_x <- "X..Fibra.Core.TB"
label_y <- "ImbTB"
mapa_dispersion(label_x,label_y,df_TB_filtered)

Imbibition Water TB Linear Model
lm_ImbTB = lm(ImbTB ~ Dia.Zafra + ST55N601 + TQ55N101_stdev + JT55N101_stdev + caña.molida.tb, data = df_TB_filtered) #Create the linear regression
summary(lm_ImbTB)
Call:
lm(formula = ImbTB ~ Dia.Zafra + ST55N601 + TQ55N101_stdev +
JT55N101_stdev + caña.molida.tb, data = df_TB_filtered)Residuals:
Min 1Q Median 3Q Max
-22.83 5.04 19.66 38.66
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.440963 145.137904 0.905079
Dia.Zafra 1.228458 0.215136 5.710 2.27e-06 ***
ST55N601 0.532491 3.752 0.000676 ***
TQ55N101_stdev -0.829769 0.293859 -2.824 **
JT55N101_stdev 1.599298 0.657877 2.431 0.020654 *
caña.molida.tb 0.008479 0.003985 2.128 0.040893 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 29.32 on 33 degrees of freedom
Multiple R-squared: 0.7808 0.7476
F-statistic: 23.51 5 and 33 DF, p-value: 5.28e-10
Imbibition Water TB Linear Model Test
# applying fitted values to TB data frame
df_TB_filtered$fitted<- lm_ImbTB$fitted.values
# creating ggplot object for visualization
lm_ImbTB_plot <- ggplot(df_TB_filtered, aes(x= Dia.Zafra, y= ImbTB,colour="real values")) +
geom_point() +
geom_line(aes(y= fitted, colour="fitted values")) +
scale_color_manual(name = "Imbibition Water TB", values = c("fitted values" = "darkblue", "real values" = "red"))
print(lm_ImbTB_plot)

