Study to create segments of oil producing wells with similar signature in regards to production performance behavior. This type of analysis could benefit engineers looking at possibly 1000’s of wells in their field in trying to find candidates for workover or re-stimulation.
The Self-Organizing Maps (SOM) model is a special class of Artificial Neural Networks, which is based on competitive learning. A self-organizing map is therefore characterized by the formation of a topographic map of the input patterns, in which the spatial locations (i.e., coordinates) of the neurons in the lattice are indicative of intrinsic statistical features contained in the input patterns—hence, the name “self-organizing map.” The self-organizing map is inherently nonlinear.
Dividing the producing field (sample of wells) on the basis of some significant features which could help a company to streamline a process for candidate selection for workover.
Study to create segments (clusters) of oil producing wells with similar signature in production performance behavior. Potential predictors or features: Best 90 producing-days, Downtime, BHFP, Pressure gradient, Reservoir pressure, Well Density, EUR, Decline rate (material balance time, or square root of time), proppant concentration, IP rate, TOC, thermal maturity, NPV, IRR, among others.
We want to explore the data to find some intrinsic structures and relationships in them. The data have no target attribute. Unsupervised learning is often performed as part of an exploratory data analysis.
A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map.
This type of analysis could benefit engineers looking at possibly 1000’s of wells in their field when searching for candidates for workover/re-stimulation, reducing time and money to define a sound list of potential candidates.
Dividing the producing field (sample of wells) on the basis of some significant features will facilitate quick characterization.
Trained Model can be used over and over with new wells in the same basin and can be utilized as a ‘starter’ in other basins with similar ‘signature’.
An Unsupervised Model is ‘unsupervised’, therefore the model can be the input for a posterior ‘supervised’ algorithm where we already know the answer to existing observations.
The goal of unsupervised learning is to ‘discover’ interesting things about the observations and consequently the producing field.
SOM is used extensively in market analysis, health care, banking and since a few years ago by the major Oil & Gas services companies in supporting reservoir studies, production optimization patterns, etc.
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(tidyverse)
library(stringr)
library(forecast) # Moving Average
library(imputeTS) # Time Series Missing Value Imputation
library(kohonen)
library(RcppRoll)
library(matrixStats)
library(DT)
library(cowplot)
library(RColorBrewer)
library(fields)
library(latticeExtra)
library(deldir)
library(ggplot2)
library(plotly)
# Reading dataset.csv
dataset <- read.csv("dataset.csv")
# eliminating wor > 5th perecntile
dataset[dataset$wor > boxplot.stats(dataset$wor)$stats[5], 9] <- NA
dataset[dataset$gor > boxplot.stats(dataset$gor)$stats[5], 8] <- NA
# convert all '0' into 'NA'
dataset[dataset == 0] <- NA
# consider only complete cases (observations)
dataset <- dataset[complete.cases(dataset),]
dataset$boecum <- dataset$cumoil + dataset$cumgas/6000
#dataset$uwi <- str_c("well_", str_pad(dataset$uwi, 4, side = "left", pad = "0"))
dt <- dataset
dt <- dt[,c(1,4,10,13,17,18,19,20,23)]
# number of samples for training set
set.seed(123)
rowtrain <- sample(1:length(dt$uwi), round(.8 * length(dt$uwi), 0), replace=FALSE)
# In Unsupervised Learning mode **we do not need** Testing Data, however, just for now
# we are selecting only a few observations in case we need to use supervised learning
train <- dt[rowtrain, c(2:9)]
test <- dt[-rowtrain, c(2:9)]
dt[rowtrain, "settype"] <- "Train"
dt[-rowtrain, "settype"] <- "Test"
| This section performs train the SOM using the Kohonen method |
# ------------------- SOM TRAINING ---------------------------
# now train the SOM using the Kohonen method
data_train_matrix <- as.matrix(scale(train))
names(data_train_matrix) <- names(train)
data_test <- scale(test,
center = attr(data_train_matrix, "scaled:center"),
scale = attr(data_train_matrix, "scaled:scale"))
data_test_matrix <- as.matrix(data_test)
names(data_test_matrix) <- names(test)
som_grid <- somgrid(xdim = 10, ydim = 10, topo="hexagonal")
# Train the SOM model!
#
# rlen: Number of Iterations
#
system.time(som_model <- som(data_train_matrix,
grid=som_grid,
rlen=200,
alpha=c(0.05,0.01),
n.hood = "circular",
keep.data = TRUE)
)
| This section performs some Clustering SOM results like # of Clusters and WCSS |
my.par <- par(mfrow=c(1, 2))
# -------------------- SOM VISUALISATION -----------------
#Visualise the SOM model results
# Plot of the training progress - how the node distances have stabilised over time.
coolBlueHotRed <- function(n, alpha = 1) {
rainbow(n, end=4/6, alpha=alpha)[n:1]
}
# Colour palette definition
pretty_palette <- c("#1f77b4", '#2ca02c', '#ff7f0e', '#d62728', '#9467bd', '#8c564b', '#e377c2')
# ------------ Strong blue, Dark lime green, Vivid orange, Strong red, Slightly desaturated violet, Dark moderate red, Soft pink
# show the WCSS metric for kmeans for different clustering sizes.
# Can be used as a "rough" indicator of the ideal number of clusters
mydata <- som_model$codes
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:20) wss[i] <- sum(kmeans(mydata,
centers=i)$withinss)
par(mfrow=c(1,2), mar=c(2.1,2.1,2.1,1.1))
# This shows the variation between the weights of the nodes and the cases presented to it.
plot(som_model, type="changes", main="Training Progress")
# The node weight vectors, or “codes”, are made up of normalised values of the original variables used to generate the SOM.
# Look at elbow point which tells your the number of clusters
plot(1:20, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares", main="Within cluster sum of squares (WCSS)")
# The SOM allows to visualise the count of how many cases are mapped to each node on the map
#plot(som_model, type="count", palette.name= coolBlueHotRed, main="Counts Plot")
# Often referred to as the “U-Matrix”, this visualisation is of the distance between each node and its neighbours.
#plot(som_model, type="dist.neighbours",palette.name= coolBlueHotRed, main="Neighbour Distance Plot")
par(mfrow=c(1, 2))
| This section produces SOM Maps, SOM Clusters Maps and Distribution of observations in the SOM Units Map |
| This section assigns predicted observations to the table with Cluster # and SOM Unit # |
| This section utilizes the trained model to predict to which cluster the test observation belongs to |
The following plotly 3D graph show how dense a set of observations in a field can be. The benefit of using a ML Unsupervised algorithm clearly shows how useful it can be when characterizing or segmenting similar groups of wells in a selected field.