The 6th Mill is the last mill in any of both tandems, meaning that the entire bagasse mass flow must go through it (similar to the 1st mill). When too much imbibition water is applied, chute level control is lost as the top speed is not enough for displacing the volume (baggase+water) that must go through the mill. This high chute level condition increases the probability of blockages and mechanical damage to the bagasse feeders, that can cause significant downtime.

The purpose of this analysis is to find an operational constraint over the imbibition water applied to the 6th Mills based upon process variables to prevent damages and downtime.

For that a Correlation Analysis for Mill 6 on TA & TB for the 2023-2024 Harvest Season is conducted.

Libraries

library(dplyr)
library(pastecs)
library(ggplot2)
library(lares)

Scatter - Correlation Plot Function

mapa_dispersion <- function(label_x,label_y,dataset) {
  correlacion <- round(cor(dataset[[label_x]],dataset[[label_y]]),2)
  ggplot(dataset, aes(.data[[label_x]], .data[[label_y]])) + 
    geom_point(
        color="orange",
        fill="#69b3a2",
        shape=21,
        alpha=0.5,
        size=6,
        stroke = 2
        ) +
    geom_smooth(method=lm , color="#990000", fill="#FFCF00", se=TRUE) +
    ggtitle(paste(label_y,"vrs.",label_x), subtitle = paste("Correlation: ",correlacion)) +
  xlab(label_x) + ylab(label_y)
}

TA & TB Dataframes

The datasets used for this analysis include operational variables like mill speed, power, torque, chute level, etc., as well as variables related to mass flows like imbibition water, bagasse, fiber, etc. The variables are daily averages.

dataset <- read.csv(file = 'Dataset.csv')

# Dataset for each tandem
df_TA <- dataset[-c(4,5,8,9,12,13,16,17,20,21,24,25,31,32,33,34,35,38,39,41,43,45,47,49,51)]
df_TB <- dataset[c(1,4,5,8,9,12,13,16,17,20,21,24,25,31,32,33,34,35,38,39,41,43,45,47,49,51,52,53)]
df_TA
df_TB

Tandem “A”

Density Function: 6th Mill TA Speed (rpm)

We first take a look at the Mill Speed Distribution, to identify the speed range where the level control of the mill is nearly lost.


# PDF
distr(df_TA,'ST55M601')
Warning: Font 'Arial Narrow' is not installed, has other name, or can't be found

# Boxplot
ggplot(df_TA, aes(y=ST55M601)) + 
    geom_boxplot( 
        # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        
        # Notch
        notch=TRUE,
        notchwidth = 0.8,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3) +
  scale_x_discrete() +
  labs(title="Boxplot",x="", y = "ST55M601 (rpm)")


# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TA$ST55M601))
NA

Maximum Operational Speed: 1200 rpms. Maximum Average Operational Speed: 1160 rpms.

# Quartiles:
res<-quantile(df_TA$ST55M601, probs = c(0,0.25,0.5,0.75,1)) 
res
       0%       25%       50%       75%      100% 
 560.6891  868.7622  941.8143 1014.9713 1156.3707 

Density Function: 6th Mill TA Power (kW)

We examine the power distribution of the mill, as a proxy variable for identifiying an acceptable load range.


# PDF
distr(df_TA,'JT55M601')


# Boxplot
ggplot(df_TA, aes(y=JT55M601)) + 
    geom_boxplot( 
        # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        
        # Notch
        notch=TRUE,
        notchwidth = 0.8,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3) +
  scale_x_discrete() +
  labs(title="Boxplot",x="", y = "JT55M601 (kW)")


# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TA$JT55M601))
NA

Maximum Average Operational Power: 769 kW

# Quartiles:
res<-quantile(df_TA$JT55M601, probs = c(0,0.25,0.5,0.75,1)) 
res
      0%      25%      50%      75%     100% 
347.0196 489.0803 565.8324 660.6705 769.0661 

Density Function: 6th Mill TA Chute Level (%)

We examine the chute level distribution of the mill.


# PDF
distr(df_TA,'LT55M601')


# Boxplot
ggplot(df_TA, aes(y=LT55M601)) + 
    geom_boxplot( 
        # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        
        notch=TRUE,
        notchwidth = 0.8,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3) +
  scale_x_discrete() +
  labs(title="Boxplot",x="", y = "LT55M601 (%)")


# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TA$LT55M601))
NA
# Quartiles:
res<-quantile(df_TA$LT55M601, probs = c(0,0.25,0.5,0.75,1)) 
res
      0%      25%      50%      75%     100% 
13.59563 21.51660 26.01151 31.23077 46.33880 

Dataframe Filtering

We filter observations for the following operational ranges:

  • Power Range for Normal Sugar Cane Milling.
  • Speed Range for High Chute Level Condition in the Mill.

# Filter Dataset by Column Values:
df_TA_filtered <- df_TA[df_TA$JT55M601>=489,] # Filter by Minimum Threshold of 1st Quartile Power
df_TA_filtered <- df_TA_filtered[df_TA_filtered$ST55M601>=1015,] # Filter by Top 25% of Maximum Operational Speed (1050)
df_TA_filtered
NA

Largest Correlation Variables with TA Imbibition Water

We inspect the largest correlated variables with Imbibition Water

corr_var(df_TA_filtered, # dataframe name
  ImbTA, # target
  max_pvalue = 0.05, # significance level
  top = 15, # top n most correlated variables with target
  plot = T
)

Scatter Plots for TA Imbibition Water

Day of Season Day of Season might be a confounder variable, because usually it is highly correlated with bagasse / fiber content of the sugar cane. We inspect this correlation.

label_x <- "Dia.Zafra"
label_y <- "fibra.caña.ta"

mapa_dispersion(label_x,label_y,df_TA_filtered)

6th Mill TA Speed

label_x <- "ST55M601"
label_y <- "ImbTA"

mapa_dispersion(label_x,label_y,df_TA_filtered)

Sugar Cane Bagasse Content TA

label_x <- "bagazo...caña.ta"
label_y <- "ImbTA"

mapa_dispersion(label_x,label_y,df_TA_filtered)

Fiber Content Tandem A

label_x <- "X..Fibra.Core.TA"
label_y <- "ImbTA"

mapa_dispersion(label_x,label_y,df_TA_filtered)

Bagasse Mass Flow TA

label_x <- "WT555801"
label_y <- "ImbTA"

mapa_dispersion(label_x,label_y,df_TA_filtered)

Sugar Cane Trash Content Tandem A

label_x <- "X..Trash.Ponderado"
label_y <- "ImbTA"

mapa_dispersion(label_x,label_y,df_TA_filtered)

Torque on Mill 1 TA Day of Season might be a confounder variable, because usually it is highly correlated with bagasse / fiber content of the sugar cane. We inspect this correlation.

label_x <- "TQ55M101"
label_y <- "ImbTA"

mapa_dispersion(label_x,label_y,df_TA_filtered)

Imbibition Water TA Linear Model

lm_ImbTA = lm(ImbTA ~ TQ55M101 + WT555801 + X..Fibra.Core.TA + ST55M601 + bagazo...caña.ta + Trash.Total...., data = df_TA_filtered) #Create the linear regression
summary(lm_ImbTA)

Call:
lm(formula = ImbTA ~ TQ55M101 + WT555801 + X..Fibra.Core.TA + 
    ST55M601 + bagazo...caña.ta + Trash.Total...., data = df_TA_filtered)

Residuals:
    Min      1Q  Median      3Q     Max -60.110 -24.433   2.742  23.406  47.965 

Coefficients:
                  Estimate Std. Error Pr(>|t|)    
(Intercept)      -577.0672   221.8002  -2.602 0.014873 *      0.1227     0.0618   1.986 0.057233 .  
WT555801            1.9592     0.7820   2.505 0.018569 *  
X..Fibra.Core.TA  -42.0349    17.0986  -2.458 0.020658
ST55M601            0.8991     0.2364   3.803 0.000744 ***
bagazo...caña.ta   18.4594     3.7877   4.874 4.28e-05 ***  -10.1793     4.3757  -2.326 0.027753 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 31.98 on 27 degrees of freedom
Multiple R-squared:  0.8399,    Adjusted R-squared:  0.8043 
F-statistic: 23.61 on 6 and 27 DF,  p-value: 1.476e-09

Imbibition Water TA Linear Model Test

# applying fitted values to TA data frame
df_TA_filtered$fitted<- lm_ImbTA$fitted.values

# creating ggplot object for visualization
lm_ImbTA_plot <- ggplot(df_TA_filtered, aes(x= Dia.Zafra, y= ImbTA,colour="real values")) +
  geom_point() +
  geom_line(aes(y= fitted, colour="fitted values")) +
  scale_color_manual(name = "Imbibition Water TA", values = c("fitted values" = "darkblue", "real values" = "red"))

print(lm_ImbTA_plot)

Tandem “B”

Density Function: 6th Mill TB Speed (rpm)

We first take a look at the Mill Speed Distribution, to identify the speed range where the level control of the mill is nearly lost.


# PDF
distr(df_TB,'ST55N601')


# Boxplot
ggplot(df_TB, aes(y=ST55N601)) + 
    geom_boxplot( 
        # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        
        # Notch
        notch=TRUE,
        notchwidth = 0.8,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3) +
  scale_x_discrete() +
  labs(title="Boxplot",x="", y = "ST55N601 (rpm)")


# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TB$ST55N601))
NA

Maximum Operational Speed: 1200 rpms. Maximum Average Operational Speed: 1160 rpms.

# Quartiles:
res<-quantile(df_TB$ST55N601, probs = c(0,0.25,0.5,0.75,1)) 
res
       0%       25%       75%      100% 
 593.0956  897.9540  965.2726 1012.9059 1165.1792 

Density Function: 6th Mill TB Power (kW)

We examine the power distribution of the mill, as a proxy variable for identifiying an acceptable load range.


# PDF
distr(df_TB,'JT55N601')


# Boxplot
ggplot(df_TB, aes(y=JT55N601)) + 
    geom_boxplot( 
        # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        
        # Notch
        notch=TRUE,
        notchwidth = 0.8,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3) +
  scale_x_discrete() +
  labs(title="Boxplot",x="", y = "JT55N601 (kW)")


# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TB$JT55N601))
NA

Maximum Operational Power: 900 kW

# Quartiles:
res<-quantile(df_TB$JT55N601, probs = c(0,0.25,0.5,0.75,1)) 
res
      0%      50%      75%     100% 
301.3789 542.7360 642.1705 722.1480 892.5287 

Density Function: 6th Mill TB Chute Level (%)

We examine the chute level distribution of the mill.


# PDF
distr(df_TB,'LT55N601')


# Boxplot
ggplot(df_TB, aes(y=LT55N601)) + 
    geom_boxplot( 
        # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        
        # Notch
        notch=TRUE,
        notchwidth = 0.8,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3) +
  scale_x_discrete() +
  labs(title="Boxplot",x="", y = "LT55N601 (%)")


# Descriptive Statistics
data.frame(Estadistica=stat.desc(df_TB$LT55N601))
NA
# Quartiles:
res<-quantile(df_TB$LT55N601, probs = c(0,0.25,0.5,0.75,1)) 
res
       0%       25%       50%       75%      100% 
 2.596137 24.073490 39.651763 48.249095 66.099806 

# Filter Dataset by Column Values:
df_TB_filtered <- df_TB[df_TB$JT55N601>=542,] # Filter by Minimum Threshold of 1st Quartile Power
df_TB_filtered <- df_TB_filtered[df_TB_filtered$ST55N601>=1012,] # Filter by Top 25% of Maximum Operational Speed
df_TB_filtered
NA

Largest Correlation Variables with TB Imbibition Water

corr_var(df_TB_filtered, # dataframe name
  ImbTB, # target
  max_pvalue = 0.05, # significance level
  top = 15, # top n most correlated variables with target
  plot = T
)

Day of Season

label_x <- "Dia.Zafra"
label_y <- "ImbTB"

mapa_dispersion(label_x,label_y,df_TB_filtered)

6th TB Mill Speed

label_x <- "ST55N601"
label_y <- "ImbTB"

mapa_dispersion(label_x,label_y,df_TB_filtered)

1st TB Mill Power

label_x <- "JT55N101"
label_y <- "ImbTB"

mapa_dispersion(label_x,label_y,df_TB_filtered)

Sugar Cane Fiber Content Tandem B

label_x <- "X..Fibra.Core.TB"
label_y <- "ImbTB"

mapa_dispersion(label_x,label_y,df_TB_filtered)

Imbibition Water TB Linear Model

lm_ImbTB = lm(ImbTB ~ Dia.Zafra + ST55N601 + TQ55N101_stdev + JT55N101_stdev + caña.molida.tb, data = df_TB_filtered) #Create the linear regression
summary(lm_ImbTB)

Call:
lm(formula = ImbTB ~ Dia.Zafra + ST55N601 + TQ55N101_stdev + 
    JT55N101_stdev + caña.molida.tb, data = df_TB_filtered)Residuals:
   Min     1Q Median     3Q    Max 
-22.83   5.04  19.66  38.66 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     17.440963 145.137904 0.905079    
Dia.Zafra        1.228458   0.215136   5.710 2.27e-06 ***
ST55N601         0.532491   3.752 0.000676 ***
TQ55N101_stdev  -0.829769   0.293859  -2.824 ** 
JT55N101_stdev   1.599298   0.657877   2.431 0.020654 *  
caña.molida.tb   0.008479   0.003985   2.128 0.040893 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 29.32 on 33 degrees of freedom
Multiple R-squared:  0.7808 0.7476 
F-statistic: 23.51  5 and 33 DF,  p-value: 5.28e-10

Imbibition Water TB Linear Model Test

# applying fitted values to TB data frame
df_TB_filtered$fitted<- lm_ImbTB$fitted.values

# creating ggplot object for visualization
lm_ImbTB_plot <- ggplot(df_TB_filtered, aes(x= Dia.Zafra, y= ImbTB,colour="real values")) +
  geom_point() +
  geom_line(aes(y= fitted, colour="fitted values")) +
  scale_color_manual(name = "Imbibition Water TB", values = c("fitted values" = "darkblue", "real values" = "red"))

print(lm_ImbTB_plot)

