Study to create segments of oil producing wells with similar signature in regards to production performance behavior. This type of analysis could benefit engineers looking at possibly 1000’s of wells in their field in trying to find candidates for workover or re-stimulation.

The Self-Organizing Maps (SOM) model is a special class of Artificial Neural Networks, which is based on competitive learning. A self-organizing map is therefore characterized by the formation of a topographic map of the input patterns, in which the spatial locations (i.e., coordinates) of the neurons in the lattice are indicative of intrinsic statistical features contained in the input patterns—hence, the name “self-organizing map.” The self-organizing map is inherently nonlinear.


Joint effort between CPQ Energy & Analytics & CSE Icon


What is Well Segmentation

Dividing the producing field (sample of wells) on the basis of some significant features which could help a company to streamline a process for candidate selection for workover.

Study Objectives

Study to create segments (clusters) of oil producing wells with similar signature in production performance behavior. Potential predictors or features: Best 90 producing-days, Downtime, BHFP, Pressure gradient, Reservoir pressure, Well Density, EUR, Decline rate (material balance time, or square root of time), proppant concentration, IP rate, TOC, thermal maturity, NPV, IRR, among others.

What is Unsupervised Learning

We want to explore the data to find some intrinsic structures and relationships in them. The data have no target attribute. Unsupervised learning is often performed as part of an exploratory data analysis.

What is a Self-Organizing Map in Unsupervised Learning?

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map.

Benefits

This type of analysis could benefit engineers looking at possibly 1000’s of wells in their field when searching for candidates for workover/re-stimulation, reducing time and money to define a sound list of potential candidates.

Additional Benefits

knitr::opts_chunk$set(echo = TRUE)

Load required packages

library(readxl)
library(tidyverse)
library(stringr)
library(forecast) # Moving Average
library(imputeTS) # Time Series Missing Value Imputation
library(kohonen)
library(RcppRoll)
library(matrixStats)
library(DT)
library(cowplot)
library(RColorBrewer)
library(fields)
library(latticeExtra)
library(deldir)
library(ggplot2)
library(plotly)

Reading a cleaned dataset - statistically significant predictors

# Reading dataset.csv

dataset <- read.csv("dataset.csv")

Transforming dataset

# eliminating wor > 5th perecntile
dataset[dataset$wor > boxplot.stats(dataset$wor)$stats[5], 9] <- NA
dataset[dataset$gor > boxplot.stats(dataset$gor)$stats[5], 8] <- NA

# convert all '0' into 'NA'
dataset[dataset == 0] <- NA

# consider only complete cases (observations)
dataset <- dataset[complete.cases(dataset),]

dataset$boecum <- dataset$cumoil + dataset$cumgas/6000 

Setup Training and Test data

dt <- dataset
dt <- dt[,c(1,4,10,13,17,18,19,20,23)]

# number of samples for training set
set.seed(123)
rowtrain <- sample(1:length(dt$uwi), round(.8 * length(dt$uwi), 0), replace=FALSE)

# In Unsupervised Learning mode **we do not need** Testing Data, however, just for now 
# we are selecting only a few observations in case we need to use supervised learning

train <- dt[rowtrain, c(2:9)]
test <- dt[-rowtrain, c(2:9)]

dt[rowtrain, "settype"] <- "Train"
dt[-rowtrain, "settype"] <- "Test"

Setup Grid and SOM Model

This section performs train the SOM using the Kohonen method
# ------------------- SOM TRAINING ---------------------------

# now train the SOM using the Kohonen method
data_train_matrix <- as.matrix(scale(train))
names(data_train_matrix) <- names(train)

data_test <- scale(test, 
                   center = attr(data_train_matrix, "scaled:center"), 
                   scale = attr(data_train_matrix, "scaled:scale"))

data_test_matrix <- as.matrix(data_test)
names(data_test_matrix) <- names(test)

som_grid <- somgrid(xdim = 10, ydim = 10, topo="hexagonal")

# Train the SOM model!
#
# rlen: Number of Iterations
#
system.time(som_model <- som(data_train_matrix, 
                             grid=som_grid, 
                             rlen=200, 
                             alpha=c(0.05,0.01), 
                             n.hood = "circular",
                             keep.data = TRUE)
            )

Hierarchical Clustering and WCSS

This section performs some Clustering SOM results like # of Clusters and WCSS
my.par <- par(mfrow=c(1, 2))

# -------------------- SOM VISUALISATION -----------------

#Visualise the SOM model results
# Plot of the training progress - how the node distances have stabilised over time.

coolBlueHotRed <- function(n, alpha = 1) {
  rainbow(n, end=4/6, alpha=alpha)[n:1]
}

# Colour palette definition
pretty_palette <- c("#1f77b4", '#2ca02c', '#ff7f0e', '#d62728', '#9467bd', '#8c564b', '#e377c2')
# ------------ Strong blue, Dark lime green, Vivid orange, Strong red, Slightly desaturated violet, Dark moderate red, Soft pink

# show the WCSS metric for kmeans for different clustering sizes.
# Can be used as a "rough" indicator of the ideal number of clusters
mydata <- som_model$codes
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:20) wss[i] <- sum(kmeans(mydata,
                                     centers=i)$withinss)

par(mfrow=c(1,2), mar=c(2.1,2.1,2.1,1.1))

# This shows the variation between the weights of the nodes and the cases presented to it.
plot(som_model, type="changes", main="Training Progress")

# The node weight vectors, or “codes”, are made up of normalised values of the original variables used to generate the SOM.
# Look at elbow point which tells your the number of clusters

plot(1:20, wss, type="b", xlab="Number of Clusters", 
     ylab="Within groups sum of squares", main="Within cluster sum of squares (WCSS)")

# The SOM allows to visualise the count of how many cases are mapped to each node on the map
#plot(som_model, type="count", palette.name= coolBlueHotRed, main="Counts Plot")

# Often referred to as the “U-Matrix”, this visualisation is of the distance between each node and its neighbours.
#plot(som_model, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")

par(mfrow=c(1, 2))

SOM Visualization

This section produces SOM Maps, SOM Clusters Maps and Distribution of observations in the SOM Units Map


Assigning SOM Unit to each observation in the Training set

This section assigns predicted observations to the table with Cluster # and SOM Unit #

Running prediction with the Testing dataset / Saving SOM units

This section utilizes the trained model to predict to which cluster the test observation belongs to

Displaying cluster on every well in the training/testing set


Using Plotly only for illustration purposes


The following plotly 3D graph show how dense a set of observations in a field can be. The beauty of using a ML Unsupervised algorithm clearly shows how powerful and handy is when characterizing or segmenting similar groups of wells in a field or across many fields.



List of potential candidates for refrac/stimulation - Subject to Production Engineer judgement


Predict should a well is a potential candidate

test2 <- read_csv("data_4prediction.csv", 
                  col_types = cols(downtime = col_double(), boe = col_double(), b90 = col_double(), 
                                   decline = col_double(), tvd = col_double(), ltrl = col_double(), 
                                   prop_conc = col_double(), eur = col_double()
                                   )
                  )

test2 <- as_data_frame(test2)
head(dt)
##         uwi downtime  boe  b90 decline      tvd    ltrl prop_conc     eur
## 1 well_0836       26 7194 9006   44.89 12812.76 2997.28     19.11 3936302
## 2 well_0782       11 8666 8259   44.83 11786.44 5087.17     19.36 3925026
## 3 well_0761      209 8096 7892   44.88 12287.87 3344.06     19.12 3935733
## 4 well_0663       56 6693 5887   44.81 12587.68 3171.77     19.64 3915736
## 5 well_0623      245 6841 6104   44.74 11360.31 6287.10     19.89 3910885
## 6 well_0621      140 5758 4476   44.64 11090.08 3657.31     19.95 3904239
##     set cluster som_unit
## 1 Train       1        1
## 2 Train       1        1
## 3 Train       1        1
## 4 Train       1        1
## 5 Train       1        1
## 6 Train       1        1
head(test2)
## # A tibble: 1 × 11
##   downtime   boe   b90 decline   tvd  ltrl prop_conc   eur   set cluster
##      <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>     <dbl> <dbl> <chr>   <dbl>
## 1      300  6000  1000      45 10000  3000        15 5e+05 Test2       2
## # ... with 1 more variables: som_unit <int>