Study of Well Segmentation by using Self Organizing Maps (Kohonen Maps) in an Unsupervised ML mode

What is Well Segmentation
Study Objectives
What is Unsupervised Learning
What is a Self-Organizing Map in Unsupervised Learning?
Benefits
Additional Benefits
Load required packages
Reading a cleaned dataset - statistically significant predictors
Transforming dataset
Setup Training and Test data
Setup Grid and SOM Model
Hierarchical Clustering and WCSS
SOM Visualization
Assigning SOM Unit to each observation in the Training set
Running prediction with the Testing dataset / Saving SOM units
Displaying cluster on every well in the training/testing set
Using Plotly only for illustration purposes
List of potential candidates for refrac/stimulation - Subject to Production Engineer judgement
Predict should a well is a potential candidate

Study to create segments of oil producing wells with similar signature in regards to production performance behavior. This type of analysis could benefit engineers looking at possibly 1000’s of wells in their field in trying to find candidates for workover or re-stimulation.

The Self-Organizing Maps (SOM) model is a special class of Artificial Neural Networks, which is based on competitive learning. A self-organizing map is therefore characterized by the formation of a topographic map of the input patterns, in which the spatial locations (i.e., coordinates) of the neurons in the lattice are indicative of intrinsic statistical features contained in the input patterns—hence, the name “self-organizing map.” The self-organizing map is inherently nonlinear.

Joint effort between CPQ Energy & Analytics & CSE Icon

What is Well Segmentation

Dividing the producing field (sample of wells) on the basis of some significant features which could help a company to streamline a process for candidate selection for workover.

Study Objectives

Study to create segments (clusters) of oil producing wells with similar signature in production performance behavior. Potential predictors or features: Best 90 producing-days, Downtime, BHFP, Pressure gradient, Reservoir pressure, Well Density, EUR, Decline rate (material balance time, or square root of time), proppant concentration, IP rate, TOC, thermal maturity, NPV, IRR, among others.

What is Unsupervised Learning

We want to explore the data to find some intrinsic structures and relationships in them. The data have no target attribute. Unsupervised learning is often performed as part of an exploratory data analysis.

What is a Self-Organizing Map in Unsupervised Learning?

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map.

Benefits

This type of analysis could benefit engineers looking at possibly 1000’s of wells in their field when searching for candidates for workover/re-stimulation, reducing time and money to define a sound list of potential candidates.

Additional Benefits

Dividing the producing field (sample of wells) on the basis of some significant features will facilitate quick characterization.
Trained Model can be used over and over with new wells in the same basin and can be utilized as a ‘starter’ in other basins with similar ‘signature’.
An Unsupervised Model is ‘unsupervised’, therefore the model can be the input for a posterior ‘supervised’ algorithm where we already know the answer to existing observations.
The goal of unsupervised learning is to ‘discover’ interesting things about the observations and consequently the producing field.
SOM is used extensively in market analysis, hospital care, banking and since a few years ago by the major Oil & Gas services companies in supporting reservoir studies, production optimization patterns, etc.

knitr::opts_chunk$set(echo = TRUE)

Load required packages

library(readxl)
library(tidyverse)
library(stringr)
library(forecast) # Moving Average
library(imputeTS) # Time Series Missing Value Imputation
library(kohonen)
library(RcppRoll)
library(matrixStats)
library(DT)
library(cowplot)
library(RColorBrewer)
library(fields)
library(latticeExtra)
library(deldir)
library(ggplot2)
library(plotly)

Reading a cleaned dataset - statistically significant predictors

# Reading dataset.csv

dataset <- read.csv("dataset.csv")

Transforming dataset

# eliminating wor > 5th perecntile
dataset[dataset$wor > boxplot.stats(dataset$wor)$stats[5], 9] <- NA
dataset[dataset$gor > boxplot.stats(dataset$gor)$stats[5], 8] <- NA

# convert all '0' into 'NA'
dataset[dataset == 0] <- NA

# consider only complete cases (observations)
dataset <- dataset[complete.cases(dataset),]

dataset$boecum <- dataset$cumoil + dataset$cumgas/6000

Setup Training and Test data

dt <- dataset
dt <- dt[,c(1,4,10,13,17,18,19,20,23)]

# number of samples for training set
set.seed(123)
rowtrain <- sample(1:length(dt$uwi), round(.8 * length(dt$uwi), 0), replace=FALSE)

# In Unsupervised Learning mode **we do not need** Testing Data, however, just for now 
# we are selecting only a few observations in case we need to use supervised learning

train <- dt[rowtrain, c(2:9)]
test <- dt[-rowtrain, c(2:9)]

dt[rowtrain, "settype"] <- "Train"
dt[-rowtrain, "settype"] <- "Test"

Setup Grid and SOM Model

This section performs train the SOM using the Kohonen method

# ------------------- SOM TRAINING ---------------------------

# now train the SOM using the Kohonen method
data_train_matrix <- as.matrix(scale(train))
names(data_train_matrix) <- names(train)

data_test <- scale(test, 
                   center = attr(data_train_matrix, "scaled:center"), 
                   scale = attr(data_train_matrix, "scaled:scale"))

data_test_matrix <- as.matrix(data_test)
names(data_test_matrix) <- names(test)

som_grid <- somgrid(xdim = 10, ydim = 10, topo="hexagonal")

# Train the SOM model!
#
# rlen: Number of Iterations
#
system.time(som_model <- som(data_train_matrix, 
                             grid=som_grid, 
                             rlen=200, 
                             alpha=c(0.05,0.01), 
                             n.hood = "circular",
                             keep.data = TRUE)
            )

Hierarchical Clustering and WCSS

This section performs some Clustering SOM results like # of Clusters and WCSS

my.par <- par(mfrow=c(1, 2))

# -------------------- SOM VISUALISATION -----------------

#Visualise the SOM model results
# Plot of the training progress - how the node distances have stabilised over time.

coolBlueHotRed <- function(n, alpha = 1) {
  rainbow(n, end=4/6, alpha=alpha)[n:1]
}

# Colour palette definition
pretty_palette <- c("#1f77b4", '#2ca02c', '#ff7f0e', '#d62728', '#9467bd', '#8c564b', '#e377c2')
# ------------ Strong blue, Dark lime green, Vivid orange, Strong red, Slightly desaturated violet, Dark moderate red, Soft pink

# show the WCSS metric for kmeans for different clustering sizes.
# Can be used as a "rough" indicator of the ideal number of clusters
mydata <- som_model$codes
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:20) wss[i] <- sum(kmeans(mydata,
                                     centers=i)$withinss)

par(mfrow=c(1,2), mar=c(2.1,2.1,2.1,1.1))

# This shows the variation between the weights of the nodes and the cases presented to it.
plot(som_model, type="changes", main="Training Progress")

# The node weight vectors, or “codes”, are made up of normalised values of the original variables used to generate the SOM.
# Look at elbow point which tells your the number of clusters

plot(1:20, wss, type="b", xlab="Number of Clusters", 
     ylab="Within groups sum of squares", main="Within cluster sum of squares (WCSS)")

# The SOM allows to visualise the count of how many cases are mapped to each node on the map
#plot(som_model, type="count", palette.name= coolBlueHotRed, main="Counts Plot")

# Often referred to as the “U-Matrix”, this visualisation is of the distance between each node and its neighbours.
#plot(som_model, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")

par(mfrow=c(1, 2))

SOM Visualization

This section produces SOM Maps, SOM Clusters Maps and Distribution of observations in the SOM Units Map

Assigning SOM Unit to each observation in the Training set

This section assigns predicted observations to the table with Cluster # and SOM Unit #

Running prediction with the Testing dataset / Saving SOM units

This section utilizes the trained model to predict to which cluster the test observation belongs to

Displaying cluster on every well in the training/testing set

Using Plotly only for illustration purposes

The following plotly 3D graph show how dense a set of observations in a field can be. The beauty of using a ML Unsupervised algorithm clearly shows how powerful and handy is when characterizing or segmenting similar groups of wells in a field or across many fields.

List of potential candidates for refrac/stimulation - Subject to Production Engineer judgement

Predict should a well is a potential candidate

test2 <- read_csv("data_4prediction.csv", 
                  col_types = cols(downtime = col_double(), boe = col_double(), b90 = col_double(), 
                                   decline = col_double(), tvd = col_double(), ltrl = col_double(), 
                                   prop_conc = col_double(), eur = col_double()
                                   )
                  )

test2 <- as_data_frame(test2)

head(dt)

##         uwi downtime  boe  b90 decline      tvd    ltrl prop_conc     eur
## 1 well_0836       26 7194 9006   44.89 12812.76 2997.28     19.11 3936302
## 2 well_0782       11 8666 8259   44.83 11786.44 5087.17     19.36 3925026
## 3 well_0761      209 8096 7892   44.88 12287.87 3344.06     19.12 3935733
## 4 well_0663       56 6693 5887   44.81 12587.68 3171.77     19.64 3915736
## 5 well_0623      245 6841 6104   44.74 11360.31 6287.10     19.89 3910885
## 6 well_0621      140 5758 4476   44.64 11090.08 3657.31     19.95 3904239
##     set cluster som_unit
## 1 Train       1        1
## 2 Train       1        1
## 3 Train       1        1
## 4 Train       1        1
## 5 Train       1        1
## 6 Train       1        1

head(test2)

## # A tibble: 1 × 11
##   downtime   boe   b90 decline   tvd  ltrl prop_conc   eur   set cluster
##      <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>     <dbl> <dbl> <chr>   <dbl>
## 1      300  6000  1000      45 10000  3000        15 5e+05 Test2       2
## # ... with 1 more variables: som_unit <int>