knitr::include_graphics("energies-15-02061-g001-550.png")At the current oil refinery, there is a unit called Fluid Catalytic Cracking. This unit has a very important role in an oil refinery. This unit converts crude oil into gasoline by cracking. in addition to producing gasoline, this unit also produces coke. This coke must be burned in the regenerator to restore the catalyst function so that it can be re-circulated. in this case, an Engineer wants to optimize the regenerator so that the amount of coke yield is always in the expected range. The expected target is the optimum operating conditions. Those parameters are Total Dry Air to Regenerator, HBW Flow, Regenerator Temperature, and Catalyst per Oil ratio.
library(dplyr) #data wrangling
library(caret) #preProcess
library(caTools) #sample.split
library(neuralnet) # model nn
library(NeuralNetTools) #NeuralNetTools
require(devtools)
source_gist('6206737') #gar.fun
library(lmtest) # linear model
library(olsrr) # outlier and laverageAt first, we have to import actual historical data from RCC unit.
fcc <- read.csv("FCC_Refinery_Data.csv")
head(fcc)## Date Total_Dry_Air_to_Regenerator HBW_Flow Regenerator_Temperature
## 1 2009-04-03 261148.4 101.42 617.68
## 2 2009-04-04 438448.0 171.98 713.93
## 3 2009-04-05 448456.5 176.89 717.09
## 4 2009-04-06 460325.3 183.66 717.95
## 5 2009-04-07 482533.4 194.75 724.77
## 6 2009-04-08 484954.4 192.76 723.33
## Cat_per_Oil_ratio Percent_Coke_yield
## 1 6.53 9.00
## 2 7.76 10.96
## 3 7.85 11.05
## 4 7.63 10.98
## 5 6.93 10.81
## 6 6.87 10.77
colnames(fcc)## [1] "Date" "Total_Dry_Air_to_Regenerator"
## [3] "HBW_Flow" "Regenerator_Temperature"
## [5] "Cat_per_Oil_ratio" "Percent_Coke_yield"
The variables that affect the amount of coke yield are :
In order to avoid bias in next data processing and modelling, the data shall be free of NA and negative value.
fcc %>%
is.na() %>%
colSums()## Date Total_Dry_Air_to_Regenerator
## 0 28
## HBW_Flow Regenerator_Temperature
## 26 34
## Cat_per_Oil_ratio Percent_Coke_yield
## 118 56
There are 28 empty cells in Total_Dry_Air_to_Regenerator, 26 empty cells in HBW_Flow, 34 empty cells in Regenerator_Temperature, 118 empty cells in Cat_per_Oil_ratio and 56 empty cells in Percent_Coke_yield. Then the negative value shall be converted into NA.
fcc_clean <- fcc # Duplicate data frame
fcc_clean[fcc_clean < 0] <- NA # Replace negative values by NAfcc_clean %>%
is.na() %>%
colSums()## Date Total_Dry_Air_to_Regenerator
## 26 30
## HBW_Flow Regenerator_Temperature
## 26 34
## Cat_per_Oil_ratio Percent_Coke_yield
## 118 58
After convert some negative value, the number of NA in some variables increased. Then na.omit used to remove all NA value.
fcc_clean <- fcc_clean %>%
select(-Date) %>%
na.omit()Then make sure all NA already removed and data is ready for further processing.
fcc_clean %>%
is.na() %>%
colSums()## Total_Dry_Air_to_Regenerator HBW_Flow
## 0 0
## Regenerator_Temperature Cat_per_Oil_ratio
## 0 0
## Percent_Coke_yield
## 0
The data then splitted into 2. data for training and data for testing. the portion for data training is 90% of all data and testing data is 10% data.
process <- preProcess(as.data.frame(fcc_clean), method=c("range"))
fcc_norm <- predict(process, as.data.frame(fcc_clean))sample<-sample.split(fcc_norm$Percent_Coke_yield,SplitRatio = 0.9)
train<-subset(fcc_norm,sample==T)
test<-subset(fcc_norm,sample==F)The model used is Neural Network, due to adjusted number of hidden layer and number of nodes can be adjusted to give flexibility to increase the model accuracy. Here, even number used, the combination is : first layer consist of 10 nodes, second layer consist of 8 nodes, third layer consist of 4 nodes and forth layer consist of 1 nodes.
set.seed(123)
model_nn <-neuralnet(Percent_Coke_yield~.,train,hidden = c(10,8,4,1))plotnet(model_nn)result_train<-compute(model_nn,test[,-5])
postResample(result_train$net.result,test[,5])## RMSE Rsquared MAE
## 0.02566209 0.81625296 0.01754204
The Rsquared produced from NN model is 92% give higher Rsquared than LM model which gave maximum 88%.
Remove outliers and laverages is a method to improve model performance. Here, outliers and laverages removed by using ols_plot_resid_lev function.
model_linear <- lm(Percent_Coke_yield~., data=fcc_clean)
outlier <- ols_plot_resid_lev(model = model_linear)eliminate <- outlier$data$txt
eliminate_col <- complete.cases(eliminate)
eliminate_col <- eliminate[eliminate_col]
fcc_clean2 <- fcc_clean[-eliminate_col, ]After got dataframe which free from outliers and laverages, then we should re-do same steps as above.
process2 <- preProcess(as.data.frame(fcc_clean2), method=c("range"))
fcc_norm2 <- predict(process2, as.data.frame(fcc_clean2))sample2<-sample.split(fcc_norm2$Percent_Coke_yield,SplitRatio = 0.9)
train2<-subset(fcc_norm2,sample2==T)
test2<-subset(fcc_norm2,sample2==F)set.seed(123)
model_nn2 <-neuralnet(Percent_Coke_yield~.,train2,hidden = c(10,8,4,1))plotnet(model_nn2)result_train2 <-compute(model_nn2,test2[,-5])
postResample(result_train2$net.result,test2[,5])## RMSE Rsquared MAE
## 0.03602461 0.92620488 0.02657743
The Model improved!, the Rsquared of NN model from data which free of outliers and leverages improve about 2%.
To check which variables taken signficant effet of NN model, the gar.fun function used to visualize those variables.
gar.fun('Percent_Coke_yield',model_nn2)As shown above, the Total dry air to regenerator giving negative effect for Coke yield. In contrary, the Cat/Oil ratio giving highest positive effect followed by HBW Flow. But Regenerator temperature not giving significat effect to Coke Yield.