knitr::include_graphics("energies-15-02061-g001-550.png")

1 FCC Refinery Optimisation

At the current oil refinery, there is a unit called Fluid Catalytic Cracking. This unit has a very important role in an oil refinery. This unit converts crude oil into gasoline by cracking. in addition to producing gasoline, this unit also produces coke. This coke must be burned in the regenerator to restore the catalyst function so that it can be re-circulated. in this case, an Engineer wants to optimize the regenerator so that the amount of coke yield is always in the expected range. The expected target is the optimum operating conditions. Those parameters are Total Dry Air to Regenerator, HBW Flow, Regenerator Temperature, and Catalyst per Oil ratio.

library(dplyr) # data transformation
library(lmtest) # linear model
library(olsrr) # outlier and laverage
library(caret) # NearZeroVar
library(car) #VIF

2 Data Preparation

At first, we have to import actual historical data from RCC unit.

fcc <- read.csv("FCC_Refinery_Data.csv")
head(fcc)

##         Date Total_Dry_Air_to_Regenerator HBW_Flow Regenerator_Temperature
## 1 2009-04-03                     261148.4   101.42                  617.68
## 2 2009-04-04                     438448.0   171.98                  713.93
## 3 2009-04-05                     448456.5   176.89                  717.09
## 4 2009-04-06                     460325.3   183.66                  717.95
## 5 2009-04-07                     482533.4   194.75                  724.77
## 6 2009-04-08                     484954.4   192.76                  723.33
##   Cat_per_Oil_ratio Percent_Coke_yield
## 1              6.53               9.00
## 2              7.76              10.96
## 3              7.85              11.05
## 4              7.63              10.98
## 5              6.93              10.81
## 6              6.87              10.77

colnames(fcc)

## [1] "Date"                         "Total_Dry_Air_to_Regenerator"
## [3] "HBW_Flow"                     "Regenerator_Temperature"     
## [5] "Cat_per_Oil_ratio"            "Percent_Coke_yield"

The variables that affect the amount of coke yield are :

Date : The date when the data taken.
Total_Dry_Air_to_Regenerator : Amount of dry air flow to regenerator (kg/h)
HBW_Flow : Amount of Hot Boiler Water to regeneator (Ton/h)
Regenerator_Temperature : Regenerator Temperature (degC)
Cat_per_Oil_ratio : Ratio of Catalyst to Oil in FCC unit (wt/wt)
Percent_Coke_yield : Percent of coke present in the catalyst (%)

3 Data Wrangling

In order to avoid bias in next data processing and modelling, the data shall be free of NA and negative value.

fcc %>% 
  is.na() %>% 
  colSums()

##                         Date Total_Dry_Air_to_Regenerator 
##                            0                           28 
##                     HBW_Flow      Regenerator_Temperature 
##                           26                           34 
##            Cat_per_Oil_ratio           Percent_Coke_yield 
##                          118                           56

There are 28 empty cells in Total_Dry_Air_to_Regenerator, 26 empty cells in HBW_Flow, 34 empty cells in Regenerator_Temperature, 118 empty cells in Cat_per_Oil_ratio and 56 empty cells in Percent_Coke_yield. Then the negative value shall be converted into NA.

fcc_clean <- fcc              # Duplicate data frame
fcc_clean[fcc_clean < 0] <- NA       # Replace negative values by NA

fcc_clean %>% 
  is.na() %>% 
  colSums()

##                         Date Total_Dry_Air_to_Regenerator 
##                           26                           30 
##                     HBW_Flow      Regenerator_Temperature 
##                           26                           34 
##            Cat_per_Oil_ratio           Percent_Coke_yield 
##                          118                           58

After convert some negative value, the number of NA in some variables increased. Then na.omit used to remove all NA value.

fcc_clean <- fcc_clean %>% 
  select(-Date) %>% 
  na.omit()

Then make sure all NA already removed and data is ready for further processing.

fcc_clean %>% 
  is.na() %>% 
  colSums()

## Total_Dry_Air_to_Regenerator                     HBW_Flow 
##                            0                            0 
##      Regenerator_Temperature            Cat_per_Oil_ratio 
##                            0                            0 
##           Percent_Coke_yield 
##                            0

The data distribution shall be checked in order to identify if any outlayer exist in data.

summary(fcc_clean)

##  Total_Dry_Air_to_Regenerator    HBW_Flow      Regenerator_Temperature
##  Min.   :146363               Min.   : 25.53   Min.   :477.3          
##  1st Qu.:409693               1st Qu.:131.54   1st Qu.:712.1          
##  Median :435519               Median :145.47   Median :716.2          
##  Mean   :430876               Mean   :144.50   Mean   :714.0          
##  3rd Qu.:464812               3rd Qu.:162.59   3rd Qu.:719.6          
##  Max.   :538074               Max.   :228.39   Max.   :733.7          
##  Cat_per_Oil_ratio Percent_Coke_yield
##  Min.   : 3.240    Min.   : 3.710    
##  1st Qu.: 7.305    1st Qu.: 9.440    
##  Median : 7.830    Median : 9.730    
##  Mean   : 7.758    Mean   : 9.835    
##  3rd Qu.: 8.170    3rd Qu.:10.250    
##  Max.   :10.320    Max.   :13.420

boxplot(fcc_clean$Total_Dry_Air_to_Regenerator)

boxplot(fcc_clean$HBW_Flow)

boxplot(fcc_clean$Regenerator_Temperature)

boxplot(fcc_clean$Cat_per_Oil_ratio)

boxplot(fcc_clean$Percent_Coke_yield)

As shown in boxplot above, the data contain some outlayer.

4 Building a LM Model

Simple Linear model used as initial observation. If the model fit above 80%, the LM model is considered sufficient to model the data. Here, all data considered important to create model.

model_linear <- lm(Percent_Coke_yield~., data=fcc_clean)
summary(model_linear)

## 
## Call:
## lm(formula = Percent_Coke_yield ~ ., data = fcc_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.50380 -0.19016 -0.02637  0.14909  1.98528 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -4.504e+00  4.490e-01  -10.03   <2e-16 ***
## Total_Dry_Air_to_Regenerator -1.115e-05  3.404e-07  -32.75   <2e-16 ***
## HBW_Flow                      3.403e-02  5.422e-04   62.77   <2e-16 ***
## Regenerator_Temperature       1.235e-02  6.885e-04   17.94   <2e-16 ***
## Cat_per_Oil_ratio             6.971e-01  1.451e-02   48.05   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3659 on 1534 degrees of freedom
## Multiple R-squared:  0.8172, Adjusted R-squared:  0.8167 
## F-statistic:  1714 on 4 and 1534 DF,  p-value: < 2.2e-16

Based on model summary above, the LM model fit about 81% of the data which is sufficient to model all data.

5 Model improvement

5.1 Outliers and Laverages removal

As mention above, the data still contain some outliers. Hence, the outliers tried to removed into improve the model fitting.

outlier <- ols_plot_resid_lev(model = model_linear)

As shown figure above, the some outliers and laverages in data and below the index of outlier and laverage.

eliminate <- outlier$data$txt
eliminate_col <- complete.cases(eliminate) 
eliminate_col <- eliminate[eliminate_col]
eliminate_col

##   [1]    1   44  139  140  241  242  269  270  327  328  329  330  331  332  333
##  [16]  334  335  336  337  338  339  340  341  342  343  344  345  346  347  348
##  [31]  349  350  351  352  353  354  355  356  357  362  363  364  365  366  367
##  [46]  368  369  377  378  379  380  381  382  383  384  385  386  387  388  404
##  [61]  405  492  642  643  644  645  647  648  649  650  651  652  653  654  655
##  [76]  667  668  728 1026 1027 1177 1213 1223 1224 1225 1226 1227 1228 1232 1233
##  [91] 1234 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333
## [106] 1334 1335 1336 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351
## [121] 1352 1353 1354 1355 1356 1357 1380 1394 1395 1431 1432 1433 1434 1435 1436

The outlier and laverage index above used to remove outliers and laverages value from dataframe.

fcc_clean2 <- fcc_clean[-eliminate_col, ]
fcc_clean2 %>% 
  count()

##      n
## 1 1404

1657-1404

## [1] 253

There are 253 value identified as outliers and laverages which already removed from dataframe. Then re-do the LM model by using data which already from outliers and laverages.

model_linear2 <- lm(Percent_Coke_yield~., data=fcc_clean2)
summary(model_linear2)

## 
## Call:
## lm(formula = Percent_Coke_yield ~ ., data = fcc_clean2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.66451 -0.13129 -0.01967  0.11013  1.03273 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -6.352e+00  6.866e-01  -9.252   <2e-16 ***
## Total_Dry_Air_to_Regenerator -1.122e-05  2.642e-07 -42.453   <2e-16 ***
## HBW_Flow                      3.076e-02  3.947e-04  77.933   <2e-16 ***
## Regenerator_Temperature       1.802e-02  9.677e-04  18.618   <2e-16 ***
## Cat_per_Oil_ratio             4.764e-01  1.016e-02  46.903   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2082 on 1399 degrees of freedom
## Multiple R-squared:  0.881,  Adjusted R-squared:  0.8806 
## F-statistic:  2589 on 4 and 1399 DF,  p-value: < 2.2e-16

The Adjusted R-squared from second LM model above improved to 88%.

6 Assumption Check

6.1 Near Zero Variance Check

After removing some outliers and laverages, the variance of data shall be checked to avoid near zero variance.

fcc_clean2 %>% 
  nearZeroVar()

## integer(0)

There is no tend of data to zero variance.

7 Multicollinearity check

vif(model_linear2)

## Total_Dry_Air_to_Regenerator                     HBW_Flow 
##                     3.161156                     3.423110 
##      Regenerator_Temperature            Cat_per_Oil_ratio 
##                     1.376814                     1.102831

The VIF value of all variables below 10, in other words thers is no Multicollinearity between variables.

shapiro.test(model_linear2$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model_linear2$residuals
## W = 0.96802, p-value < 2.2e-16

plot(model_linear2$residuals)

Based on the Shapiro test above, p-value is below 0.05 which means that the residual not distributed evenly or not normal.

bptest(model_linear2)

## 
##  studentized Breusch-Pagan test
## 
## data:  model_linear2
## BP = 17.667, df = 4, p-value = 0.001433

Based on the bptest above, p-value is below 0.05 which means there are heteroscedasticity in data.

8 Model Summary

Percent_Coke_yield = 4.764e-01 * Cat_per_Oil_ratio + 1.802e-02 * Regenerator_Temperature + 3.076e-02 * HBW_Flow - 1.122e-05 * Total_Dry_Air_to_Regenerator - 6.352e+00

In actual case, even if some parameters have strong correlation to coke yield, some parameter could changed easly due to rating issues of pump and product distributions. Now,there are some constrains related to catalyst per oil ratio and HBW Flow.

Current catalyst per oil ratio already gave maximum gasoline production, and pump flow of HBW could not changed due to steam balance in the unit. Hence, catalyst per oil ratio shall be fixed at 8 and rated flow of HBW pump is 125 Ton/h. In order to maintain the heat balance in regenerator-reactor, the yeid of Coke shall be minimum at 9.7.

Hence, only regenerator temperature and Total dry air regenerator have possibility to adjust and the equation now became :

8.4 = 1.802e-02 * Regenerator_Temperature - 1.122e-05 * Total_Dry_Air_to_Regenerator

if normal operating temperature of regenerator 750 degC and the material limitation of regenerator is 850 degC, what is the total dry air to regenerator shall be set by the operator ? at 750 degC, the total dry air flow shall be 456256.68 kg/h and while at 850 degC the total dry air flow shall be maximum at 616862.75 kg/h.

9 Conclusion

The LM model successfully built to modelling the regenerator in FCC unit.
Based on assumption test, such as VIF, shapiro and bptest shown that the data almost fulfill all assumptions.
The equation extracted from LM model is : Percent_Coke_yield = 4.764e-01 * Cat_per_Oil_ratio + 1.802e-02 * Regenerator_Temperature + 3.076e-02 * HBW_Flow - 1.122e-05 * Total_Dry_Air_to_Regenerator - 6.352e+00.
The flow of total dry air flow shall be maintained at 456256.68 kg/h and regenerator temperature shall be maintained at 750 degC.

MACHINE LEARNING AS TOOLS FOR OPTIMIZATION CHEMICAL & REFINERY PROCESSES : FCC UNIT OPTIMIZATION

affandry taufik

2023-04-26