Oil production is a capital-intensive business. A global estimate puts the cost of drilling on-shore wells in between USD 1-2 million per well. Drillers wells does not always find oil. If an oil reserve is found, the driller has struck payday. Productive wells can produce oil for years.
A newly producing oil well has natural pressure that pushes oil out of the well to flow out naturally. However, as oil is extracted from the well, the well pressure drops and the oil has to be artificially lifted. Artifical lifting employs methods to lift oil from a well, allowing oil to be extracted from wells with no natural flow. Installation of artificial lifting requires investment and increased operation cost.
Artificial lifts and related costs
The standard method of artificial lifting is the beam pump, also known as the sucker-rod pump (SRP). The sucker-rod pump transfers its vertical bobbing motion to a reciprocating downhole pump inside the well. It acts like a bike-pump in reverse, pumping oil out of a reservoir.
Sucker-rod pump
Artificial lifting adds additional overhead and operations cost, so it is imperative for drilling operations to run efficiently. Monitoring pumping efficiency carried out by monitoring the downhole dynamometer graph or dynocard. A dynocard shows how the pump inside the well functions. A good dynocard has an almost square.
Full pump dynocard
Pumps that does not function efficiently deviates from the efficient dynocard.
Standing valve leak. Pump is not efficient as it returns some of the oil back into the reservoir, instead of pumping it to the surface.
Pumped-off well. This indicates that the downhole pump is not well submerged in the fluid.
Fluid pound. This indicates that the pumping rate is higher than the reservoir production.
Inefficiencies lowers production and increaes costs. The dynocard can also show unfavorable operating conditions such as fluid pound or a pumped-off well. These conditions increase wear and tear, which, in the long run, may damage the pump which necessitates down time for repairs and replacement.
The downhole card is usually monitored by petroleum engineers daily, usually in the morning. However, production conditions may change throughout the day. Oil production run for 24 hours, 7 days a week. It is impossible for a petroleum engineer to be on standby for that whole duration.
A decrease in 10% efficiency for 12 hours (during offtime) in a conservative well in Indonesia (300 BOPD) results in a decreased production of 15 barrels of oil. At the current value of USD 42 (Rp. 630.000) per barrels, the loss for the company due to inefficiency amounts to Rp. 10 million per day. This may seem small, but to put things in persepective, Pertamina produces an average of 414.400 barrels oil per day (BOPD) in 2019 1. An improvement taken to remedy the inefficiencies similar to our previous scenario can potentially add revenues up to Rp. 13 billion per day.
A model that can classify the downhole card automatically in real-time can monitor the performance of oil wells continuously, 24 hours a day, 7 days a week. The interpretation result can be used to give out diagnose problems and suggestions how to improve production efficiency. This can improve efficiency and avoid costly down time and repairs.
The goal of this project is to train and deploy machine learning model that can classify down dynocards in near real-time with 95% accuracy. This project is a small part of the whole, which is to provide actionable recommendations powered by the model.
To reach the stated goal, the project is divided into milestones :
Several published papers have utilized machine learning for the analysis of downhole dynocards.
Sharaf2 compared the performance trained image recognition models VGG, ResNeXt34 and ResNeXt50 on a dataset of 80 training and 16 test dynocard images. The dynocard images consists of 8 classes of dynocards : normal, gas interference, gas interference severe, fluid pound severe, standing valve leak, worn pump stuck piston and sand production. The best model is ResNet34 with 100% accuracy, followed by VGG with 87.5% accuracy and ResNext50 with 56.25% accuracy.
In 2019, Sharaf3 used various methods to classify downhole card data obtained in Bahrain. The data is represented as text, with each downhole card data represented by 100 pairs of scaled positions vs load. The models were trained on 22,298 manually-labelled images. The resulting models achieved 99.99% validation and 100% testing accuracies.
The methodology that is used in this project is similar to the second method employed by Sharaf. However, before processing, each card is scaled uniformly and its dimensionality reduced using PCA. The diagram below summarizes the steps.
methodology
library(tidyverse) # Wrangling and visualization
library(FactoMineR) # PCA
library(umap) # Visual clustering using UMAP
library(dbscan) # Clustering using HDBSCAN
library(factoextra) # Clustering functions
library(caret) # Confusion matrix
library(kableExtra) # Pretty printing tablesThe data is obtained from records of several oil wells over the course of July 2020.
data <- read_csv('downholecard_data.csv')
glimpse(data)## Rows: 36,647
## Columns: 2
## $ rownum <dbl> 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111, 121,…
## $ CurveCoordinate <chr> "-121.55,62438;-121.7,62422;-121.52,62408;-121.01,623…
The data contains 36647 and 2 columns :
To get a sense of the data, let's see the first row
data[1, 'CurveCoordinate']## # A tibble: 1 x 1
## CurveCoordinate
## <chr>
## 1 -121.55,62438;-121.7,62422;-121.52,62408;-121.01,62398;-120.17,62391;-119.02,…
Each row contains an id and a CurveCoordinate column which contains the 100 pairs of position vs load data. We can replace the row number rownum to make referencing the row easier.
# Change rownumber into ordered id
data$rownum <- seq(1:nrow(data))Next, I introduce the function drawDynocard below to make it easier to draw dynocards.
drawDynocard <- function(d, is_clean) {
# If dynocard data is not cleaned, then clean
if(!is_clean) {
# Create column names
columns <- c()
for (i in seq(1:100)) {
for(j in c('x', 'y')) {
columns <- c(columns, paste(j,i, sep = ''))
}
}
# Separate into rows
row <- separate_rows(d, CurveCoordinate, sep = ";")
# Separate into columns
row <- separate(row, CurveCoordinate, into = c('x', 'y'), sep = ",", convert = TRUE)
} else {
# Transform columns into rows
rows_x <- d[, seq(2,200,2)] %>% t()
rows_y <- d[, seq(3,201,2)] %>% t()
row <- as_tibble(cbind(rownum = d[,1], x = rows_x, y = rows_y))
}
# Draw dynocard
row %>%
add_row(row[1, ]) %>% # Add first row to close the curve
ggplot(aes(x = x, y = y)) +
geom_path() +
labs(title = paste('Downhole dynocard id:', d[1,1]), x = 'Position (in)', y = 'Load (lbs)')
}Let's plot some cards
idx <- sample(nrow(data), 5)
for(i in idx) {
print(drawDynocard(data[i,], is_clean = F))
}The data in its current form can not be used for analysis. CurveCoordinate has to be expanded into each values into its own column
# Create column names
columns <- c()
for (i in seq(1:100)) {
for(j in c('x', 'y')) {
columns <- c(columns, paste(j,i, sep = ''))
}
}
# Separating values into columns
data.col <- data %>%
separate(CurveCoordinate, into = columns, sep = "[,;]", convert = TRUE)
head(data.col) %>%
kable() %>%
kable_styling()| rownum | x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | x5 | y5 | x6 | y6 | x7 | y7 | x8 | y8 | x9 | y9 | x10 | y10 | x11 | y11 | x12 | y12 | x13 | y13 | x14 | y14 | x15 | y15 | x16 | y16 | x17 | y17 | x18 | y18 | x19 | y19 | x20 | y20 | x21 | y21 | x22 | y22 | x23 | y23 | x24 | y24 | x25 | y25 | x26 | y26 | x27 | y27 | x28 | y28 | x29 | y29 | x30 | y30 | x31 | y31 | x32 | y32 | x33 | y33 | x34 | y34 | x35 | y35 | x36 | y36 | x37 | y37 | x38 | y38 | x39 | y39 | x40 | y40 | x41 | y41 | x42 | y42 | x43 | y43 | x44 | y44 | x45 | y45 | x46 | y46 | x47 | y47 | x48 | y48 | x49 | y49 | x50 | y50 | x51 | y51 | x52 | y52 | x53 | y53 | x54 | y54 | x55 | y55 | x56 | y56 | x57 | y57 | x58 | y58 | x59 | y59 | x60 | y60 | x61 | y61 | x62 | y62 | x63 | y63 | x64 | y64 | x65 | y65 | x66 | y66 | x67 | y67 | x68 | y68 | x69 | y69 | x70 | y70 | x71 | y71 | x72 | y72 | x73 | y73 | x74 | y74 | x75 | y75 | x76 | y76 | x77 | y77 | x78 | y78 | x79 | y79 | x80 | y80 | x81 | y81 | x82 | y82 | x83 | y83 | x84 | y84 | x85 | y85 | x86 | y86 | x87 | y87 | x88 | y88 | x89 | y89 | x90 | y90 | x91 | y91 | x92 | y92 | x93 | y93 | x94 | y94 | x95 | y95 | x96 | y96 | x97 | y97 | x98 | y98 | x99 | y99 | x100 | y100 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -121.55 | 62438 | -121.70 | 62422 | -121.52 | 62408 | -121.01 | 62398 | -120.17 | 62391 | -119.02 | 62389 | -117.54 | 62390 | -115.76 | 62396 | -113.68 | 62406 | -111.30 | 62421 | -108.64 | 62441 | -105.70 | 62464 | -102.50 | 62491 | -99.04 | 62521 | -95.35 | 62553 | -91.42 | 62587 | -87.29 | 62623 | -82.96 | 62660 | -78.45 | 62698 | -73.78 | 62737 | -68.97 | 62778 | -64.04 | 62820 | -59.01 | 62864 | -53.90 | 62909 | -48.73 | 62955 | -43.53 | 63003 | -38.30 | 63052 | -33.08 | 63103 | -27.87 | 63154 | -22.71 | 63206 | -17.61 | 63258 | -12.60 | 63310 | -7.68 | 63362 | -2.89 | 63415 | 1.77 | 63466 | 6.26 | 63517 | 10.58 | 63566 | 14.71 | 63614 | 18.63 | 63660 | 22.32 | 63704 | 25.78 | 63745 | 28.98 | 63782 | 31.92 | 63817 | 34.58 | 63849 | 36.95 | 63879 | 39.03 | 63906 | 40.79 | 63932 | 42.24 | 63956 | 43.36 | 63980 | 44.16 | 64002 | 44.63 | 64023 | 44.77 | 64042 | 44.58 | 64060 | 44.06 | 64074 | 43.22 | 64084 | 42.06 | 64089 | 40.58 | 64090 | 38.80 | 64085 | 36.72 | 64074 | 34.35 | 64059 | 31.69 | 64039 | 28.75 | 64015 | 25.55 | 63989 | 22.08 | 63960 | 18.38 | 63930 | 14.45 | 63899 | 10.30 | 63867 | 5.96 | 63834 | 1.44 | 63799 | -3.23 | 63761 | -8.04 | 63720 | -12.96 | 63675 | -17.98 | 63627 | -23.08 | 63576 | -28.23 | 63521 | -33.41 | 63463 | -38.62 | 63405 | -43.82 | 63345 | -49.01 | 63287 | -54.16 | 63230 | -59.25 | 63175 | -64.27 | 63123 | -69.20 | 63074 | -74.00 | 63027 | -78.67 | 62983 | -83.19 | 62939 | -87.52 | 62898 | -91.66 | 62856 | -95.59 | 62816 | -99.29 | 62776 | -102.74 | 62736 | -105.94 | 62697 | -108.87 | 62660 | -111.52 | 62624 | -113.88 | 62590 | -115.95 | 62558 | -117.70 | 62529 | -119.15 | 62502 | -120.28 | 62478 | -121.08 | 62457 |
| 2 | 4.34 | -3650 | 3.99 | -3517 | 3.83 | -3350 | 3.86 | -3156 | 4.10 | -2944 | 4.58 | -2727 | 5.30 | -2519 | 6.30 | -2331 | 7.58 | -2175 | 9.16 | -2058 | 11.03 | -1983 | 13.18 | -1951 | 15.60 | -1955 | 18.26 | -1990 | 21.13 | -2044 | 24.19 | -2109 | 27.40 | -2173 | 30.74 | -2229 | 34.18 | -2271 | 37.71 | -2297 | 41.32 | -2306 | 44.99 | -2301 | 48.72 | -2284 | 52.50 | -2260 | 56.32 | -2233 | 60.18 | -2205 | 64.05 | -2178 | 67.94 | -2154 | 71.81 | -2132 | 75.66 | -2112 | 79.46 | -2093 | 83.19 | -2073 | 86.85 | -2054 | 90.42 | -2035 | 93.89 | -2018 | 97.24 | -2003 | 100.47 | -1990 | 103.55 | -1980 | 106.49 | -1973 | 109.25 | -1965 | 111.84 | -1956 | 114.23 | -1943 | 116.41 | -1924 | 118.37 | -1899 | 120.11 | -1869 | 121.61 | -1836 | 122.89 | -1805 | 123.94 | -1780 | 124.77 | -1769 | 125.39 | -1776 | 125.81 | -1808 | 126.03 | -1866 | 126.06 | -1952 | 125.88 | -2062 | 125.49 | -2191 | 124.88 | -2332 | 124.05 | -2478 | 122.96 | -2618 | 121.63 | -2745 | 120.04 | -2854 | 118.19 | -2940 | 116.09 | -3002 | 113.75 | -3042 | 111.18 | -3064 | 108.42 | -3073 | 105.48 | -3075 | 102.38 | -3075 | 99.14 | -3078 | 95.78 | -3087 | 92.32 | -3103 | 88.76 | -3126 | 85.13 | -3155 | 81.42 | -3188 | 77.65 | -3222 | 73.83 | -3257 | 69.98 | -3290 | 66.11 | -3322 | 62.24 | -3353 | 58.38 | -3383 | 54.55 | -3413 | 50.77 | -3444 | 47.05 | -3475 | 43.41 | -3506 | 39.86 | -3535 | 36.41 | -3562 | 33.08 | -3586 | 29.86 | -3606 | 26.79 | -3624 | 23.86 | -3640 | 21.11 | -3657 | 18.54 | -3676 | 16.17 | -3699 | 14.01 | -3727 | 12.06 | -3759 | 10.34 | -3791 | 8.83 | -3819 | 7.54 | -3836 | 6.46 | -3834 | 5.57 | -3806 | 4.86 | -3746 |
| 3 | -121.54 | 62434 | -121.70 | 62421 | -121.52 | 62413 | -121.03 | 62407 | -120.20 | 62405 | -119.05 | 62405 | -117.59 | 62409 | -115.81 | 62416 | -113.72 | 62426 | -111.34 | 62439 | -108.67 | 62456 | -105.73 | 62476 | -102.51 | 62499 | -99.05 | 62524 | -95.34 | 62553 | -91.42 | 62584 | -87.28 | 62617 | -82.94 | 62652 | -78.43 | 62689 | -73.76 | 62728 | -68.95 | 62768 | -64.02 | 62810 | -58.99 | 62853 | -53.88 | 62898 | -48.71 | 62944 | -43.50 | 62992 | -38.28 | 63042 | -33.06 | 63093 | -27.86 | 63145 | -22.70 | 63198 | -17.60 | 63251 | -12.59 | 63305 | -7.67 | 63358 | -2.88 | 63410 | 1.77 | 63461 | 6.27 | 63512 | 10.59 | 63561 | 14.72 | 63609 | 18.64 | 63656 | 22.33 | 63701 | 25.78 | 63745 | 28.98 | 63786 | 31.90 | 63826 | 34.55 | 63863 | 36.92 | 63897 | 38.98 | 63928 | 40.74 | 63956 | 42.18 | 63982 | 43.31 | 64003 | 44.12 | 64022 | 44.60 | 64038 | 44.75 | 64050 | 44.58 | 64060 | 44.08 | 64066 | 43.25 | 64069 | 42.10 | 64069 | 40.64 | 64066 | 38.86 | 64059 | 36.78 | 64048 | 34.40 | 64033 | 31.73 | 64015 | 28.79 | 63993 | 25.58 | 63968 | 22.12 | 63940 | 18.42 | 63909 | 14.49 | 63876 | 10.35 | 63841 | 6.02 | 63804 | 1.51 | 63766 | -3.16 | 63727 | -7.97 | 63687 | -12.90 | 63645 | -17.93 | 63602 | -23.04 | 63557 | -28.20 | 63511 | -33.41 | 63464 | -38.64 | 63415 | -43.86 | 63366 | -49.07 | 63315 | -54.23 | 63265 | -59.34 | 63214 | -64.36 | 63164 | -69.28 | 63115 | -74.08 | 63067 | -78.75 | 63019 | -83.25 | 62973 | -87.58 | 62927 | -91.71 | 62881 | -95.63 | 62837 | -99.32 | 62792 | -102.77 | 62749 | -105.96 | 62706 | -108.88 | 62664 | -111.53 | 62625 | -113.88 | 62587 | -115.94 | 62553 | -117.69 | 62522 | -119.13 | 62494 | -120.25 | 62470 | -121.06 | 62450 |
| 4 | 0.83 | -1741 | 0.45 | -1643 | 0.29 | -1532 | 0.35 | -1415 | 0.64 | -1298 | 1.19 | -1185 | 2.00 | -1083 | 3.07 | -994 | 4.42 | -923 | 6.03 | -869 | 7.89 | -834 | 10.00 | -814 | 12.34 | -808 | 14.88 | -812 | 17.61 | -822 | 20.51 | -835 | 23.56 | -848 | 26.73 | -859 | 30.03 | -866 | 33.43 | -871 | 36.93 | -872 | 40.51 | -871 | 44.16 | -869 | 47.87 | -866 | 51.64 | -863 | 55.43 | -860 | 59.25 | -858 | 63.07 | -854 | 66.86 | -848 | 70.62 | -839 | 74.33 | -826 | 77.96 | -807 | 81.50 | -782 | 84.94 | -752 | 88.26 | -717 | 91.45 | -679 | 94.51 | -638 | 97.42 | -595 | 100.19 | -553 | 102.79 | -512 | 105.23 | -474 | 107.50 | -437 | 109.57 | -402 | 111.46 | -370 | 113.14 | -340 | 114.60 | -311 | 115.85 | -285 | 116.87 | -261 | 117.67 | -242 | 118.24 | -229 | 118.59 | -224 | 118.71 | -229 | 118.63 | -247 | 118.34 | -279 | 117.84 | -327 | 117.15 | -391 | 116.27 | -470 | 115.20 | -562 | 113.94 | -664 | 112.47 | -772 | 110.81 | -883 | 108.94 | -991 | 106.86 | -1094 | 104.57 | -1186 | 102.07 | -1266 | 99.37 | -1331 | 96.47 | -1383 | 93.39 | -1420 | 90.15 | -1446 | 86.78 | -1464 | 83.28 | -1476 | 79.69 | -1485 | 76.04 | -1496 | 72.33 | -1510 | 68.60 | -1529 | 64.86 | -1553 | 61.12 | -1582 | 57.39 | -1615 | 53.69 | -1650 | 50.00 | -1685 | 46.36 | -1717 | 42.76 | -1746 | 39.22 | -1770 | 35.74 | -1788 | 32.36 | -1803 | 29.08 | -1814 | 25.92 | -1824 | 22.91 | -1835 | 20.07 | -1848 | 17.40 | -1864 | 14.92 | -1884 | 12.64 | -1907 | 10.56 | -1930 | 8.68 | -1951 | 7.00 | -1966 | 5.51 | -1970 | 4.21 | -1961 | 3.09 | -1934 | 2.15 | -1889 | 1.39 | -1825 |
| 5 | -12.79 | 11011 | -12.99 | 10965 | -13.00 | 10907 | -12.85 | 10855 | -12.54 | 10822 | -12.11 | 10823 | -11.56 | 10861 | -10.90 | 10931 | -10.10 | 11016 | -9.13 | 11091 | -7.98 | 11129 | -6.59 | 11101 | -4.96 | 10988 | -3.07 | 10784 | -0.93 | 10495 | 1.41 | 10147 | 3.90 | 9772 | 6.49 | 9415 | 9.11 | 9116 | 11.71 | 8908 | 14.25 | 8812 | 16.72 | 8829 | 19.13 | 8943 | 21.50 | 9121 | 23.88 | 9322 | 26.31 | 9502 | 28.85 | 9624 | 31.49 | 9666 | 34.25 | 9625 | 37.10 | 9516 | 39.97 | 9373 | 42.80 | 9241 | 45.53 | 9164 | 48.08 | 9180 | 50.43 | 9308 | 52.55 | 9544 | 54.49 | 9858 | 56.29 | 10196 | 58.04 | 10487 | 59.82 | 10657 | 61.72 | 10637 | 63.79 | 10378 | 66.06 | 9859 | 68.50 | 9093 | 71.06 | 8129 | 73.63 | 7042 | 76.08 | 5926 | 78.29 | 4883 | 80.14 | 4001 | 81.53 | 3349 | 82.42 | 2963 | 82.80 | 2841 | 82.72 | 2949 | 82.25 | 3225 | 81.51 | 3589 | 80.60 | 3962 | 79.60 | 4274 | 78.59 | 4481 | 77.58 | 4569 | 76.56 | 4555 | 75.46 | 4488 | 74.21 | 4435 | 72.71 | 4471 | 70.87 | 4666 | 68.64 | 5071 | 65.98 | 5711 | 62.92 | 6577 | 59.50 | 7630 | 55.83 | 8805 | 52.01 | 10020 | 48.17 | 11189 | 44.42 | 12231 | 40.84 | 13088 | 37.50 | 13722 | 34.40 | 14124 | 31.53 | 14313 | 28.87 | 14326 | 26.34 | 14211 | 23.89 | 14022 | 21.47 | 13804 | 19.06 | 13592 | 16.62 | 13404 | 14.19 | 13244 | 11.78 | 13104 | 9.42 | 12966 | 7.16 | 12814 | 5.02 | 12633 | 3.02 | 12419 | 1.17 | 12175 | -0.55 | 11916 | -2.14 | 11660 | -3.65 | 11428 | -5.07 | 11240 | -6.43 | 11105 | -7.72 | 11027 | -8.93 | 10997 | -10.03 | 11001 | -10.99 | 11021 | -11.78 | 11037 | -12.39 | 11036 |
| 6 | -9.67 | 1728 | -10.16 | 1831 | -10.60 | 1967 | -11.02 | 2135 | -11.41 | 2328 | -11.76 | 2539 | -12.04 | 2755 | -12.23 | 2965 | -12.26 | 3155 | -12.09 | 3315 | -11.64 | 3434 | -10.87 | 3508 | -9.74 | 3534 | -8.22 | 3515 | -6.34 | 3457 | -4.14 | 3371 | -1.69 | 3270 | 0.92 | 3166 | 3.57 | 3072 | 6.18 | 2999 | 8.66 | 2955 | 10.94 | 2943 | 13.02 | 2961 | 14.91 | 3006 | 16.66 | 3069 | 18.35 | 3141 | 20.07 | 3212 | 21.91 | 3272 | 23.94 | 3315 | 26.20 | 3336 | 28.68 | 3334 | 31.35 | 3314 | 34.13 | 3281 | 36.93 | 3242 | 39.63 | 3206 | 42.14 | 3181 | 44.39 | 3172 | 46.32 | 3182 | 47.94 | 3211 | 49.28 | 3257 | 50.39 | 3314 | 51.37 | 3376 | 52.29 | 3434 | 53.24 | 3482 | 54.26 | 3514 | 55.37 | 3527 | 56.55 | 3519 | 57.75 | 3490 | 58.91 | 3443 | 59.95 | 3380 | 60.79 | 3306 | 61.39 | 3222 | 61.72 | 3131 | 61.78 | 3033 | 61.62 | 2928 | 61.28 | 2812 | 60.85 | 2685 | 60.37 | 2546 | 59.91 | 2393 | 59.49 | 2229 | 59.11 | 2056 | 58.72 | 1881 | 58.26 | 1710 | 57.67 | 1550 | 56.86 | 1409 | 55.77 | 1294 | 54.34 | 1208 | 52.58 | 1154 | 50.49 | 1131 | 48.14 | 1135 | 45.60 | 1161 | 42.96 | 1200 | 40.29 | 1244 | 37.69 | 1287 | 35.19 | 1321 | 32.84 | 1341 | 30.61 | 1347 | 28.50 | 1338 | 26.45 | 1317 | 24.42 | 1290 | 22.35 | 1261 | 20.21 | 1237 | 17.99 | 1222 | 15.68 | 1221 | 13.31 | 1236 | 10.92 | 1266 | 8.55 | 1309 | 6.26 | 1362 | 4.08 | 1418 | 2.07 | 1474 | 0.23 | 1523 | -1.42 | 1561 | -2.89 | 1587 | -4.17 | 1600 | -5.30 | 1603 | -6.28 | 1599 | -7.14 | 1594 | -7.89 | 1598 | -8.56 | 1616 | -9.15 | 1658 |
Now that we have reshaped the data, we can check for missing values, and remove them.
colSums(is.na(data.col))## rownum x1 y1 x2 y2 x3 y3 x4 y4 x5 y5
## 0 0 0 1 1 1 1 1 1 1 1
## x6 y6 x7 y7 x8 y8 x9 y9 x10 y10 x11
## 1 1 1 1 1 1 1 1 1 1 1
## y11 x12 y12 x13 y13 x14 y14 x15 y15 x16 y16
## 1 1 1 1 1 1 1 1 1 1 1
## x17 y17 x18 y18 x19 y19 x20 y20 x21 y21 x22
## 1 1 1 1 1 1 1 1 1 1 1
## y22 x23 y23 x24 y24 x25 y25 x26 y26 x27 y27
## 1 1 1 1 1 1 1 1 1 1 1
## x28 y28 x29 y29 x30 y30 x31 y31 x32 y32 x33
## 1 1 1 1 1 1 1 1 1 1 1
## y33 x34 y34 x35 y35 x36 y36 x37 y37 x38 y38
## 1 1 1 1 1 1 1 1 1 1 1
## x39 y39 x40 y40 x41 y41 x42 y42 x43 y43 x44
## 1 1 1 1 1 1 1 1 1 1 1
## y44 x45 y45 x46 y46 x47 y47 x48 y48 x49 y49
## 1 1 1 1 1 1 1 1 1 1 1
## x50 y50 x51 y51 x52 y52 x53 y53 x54 y54 x55
## 1 1 1 1 1 1 1 1 1 1 1
## y55 x56 y56 x57 y57 x58 y58 x59 y59 x60 y60
## 1 1 1 1 1 1 1 1 1 1 1
## x61 y61 x62 y62 x63 y63 x64 y64 x65 y65 x66
## 1 1 1 1 1 1 1 1 1 1 1
## y66 x67 y67 x68 y68 x69 y69 x70 y70 x71 y71
## 1 1 1 1 1 1 1 1 1 1 1
## x72 y72 x73 y73 x74 y74 x75 y75 x76 y76 x77
## 1 1 1 1 1 1 1 1 1 1 1
## y77 x78 y78 x79 y79 x80 y80 x81 y81 x82 y82
## 1 1 1 1 1 1 1 1 1 1 1
## x83 y83 x84 y84 x85 y85 x86 y86 x87 y87 x88
## 1 1 1 1 1 1 1 1 1 1 1
## y88 x89 y89 x90 y90 x91 y91 x92 y92 x93 y93
## 1 1 1 1 1 1 1 1 1 1 1
## x94 y94 x95 y95 x96 y96 x97 y97 x98 y98 x99
## 1 1 1 1 1 1 1 1 1 1 1
## y99 x100 y100
## 1 1 1
# Dropping missing values
data.col <- data.col %>%
drop_na()Because the amount of data is huge, we'll work with a subset of them. Once we have the correct steps, we can repeat the steps with all the data. Let's take a quarter (~9000) data.
idx <- sample(nrow(data.col), 9000)
data.sample <- data.col[idx, ]Now we are ready to wrangle and explore our data
The dynocards are derived from measurement values of load vs position of the polished rod above ground, and transformed into the downhole dynocard using the equipment parameters. The main feature that is seen from the dynograph is the shape and the relative dimension of the card, not the absolute reading values. To make the data device-independent, we can normalize the data by scaling it into a set scale. This removes the dependence of the data to equipment size. Let's scale the data so it falls between 0 and 1. We can use the following function
# Min-max inputs into a 1x1 box
scaleBox <- function(d) {
# Get range in x
range_x <- range(d[, seq(2, 200, 2)])
min_x <- range_x[1]
max_x <- range_x[2]
# Get range in y
range_y <- range(d[, seq(3, 201, 2)])
min_y <- range_y[1]
max_y <- range_y[2]
# Minmax scale all values of x and y
d[, seq(2, 200, 2)] <- (d[, seq(2, 200, 2)] - min_x )/ (max_x - min_x)
d[, seq(3, 201, 2)] <- (d[, seq(3, 201, 2)] - min_y )/ (max_y - min_y)
return(d)
}data.scaled <- scaleBox(data.sample)Let's now try to view or transformed dynocards
drawDynocard(data.scaled[3, ], is_clean = T)Let's look at the pricipal components
sample.pca <- PCA(data.scaled, scale.unit = F, quali.sup = c(1), graph = F, ncp = 10)
plot.PCA(x = sample.pca, choix = 'varcor', habillage = 10, select = 'contrib 10')sample.pca$eig[1:10, ]## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 0.273619131 47.1980194 47.19802
## comp 2 0.207574392 35.8056111 83.00363
## comp 3 0.026866964 4.6344255 87.63806
## comp 4 0.017630684 3.0412104 90.67927
## comp 5 0.002495220 0.4304138 91.10968
## comp 6 0.002327396 0.4014648 91.51115
## comp 7 0.002038564 0.3516428 91.86279
## comp 8 0.001731051 0.2985983 92.16139
## comp 9 0.001579043 0.2723775 92.43376
## comp 10 0.001494278 0.2577560 92.69152
From the analysis, we can see that the first 10 PCs covers 92.69% of the total variance.
Because our data is unlabeled, we have to label our data. It'll take a lot of time to manually label each data. Clustering can reduce the work by grouping similar dynocards. There are 2 methods of clustering that we will use and compare : TSNE and UMAP.
set.seed(888)
# idx <- sample(nrow(data.col), 9000)
# data.sample <- data.col[idx, ]
# data.scaled <- scaleBox(data.sample)
# data.scaled <- scaleBox(data.)
data.scaled <- data.scaled[,-1] %>%
distinct()library(Rtsne)
data.tsne <- Rtsne(data.scaled)
plot(data.tsne$Y)The TSNE cluster produces a lot of clusters. We'll group and cluster these data using a density-based clustering algorithm such as HDBSCAN.
data.tsne.cluster <- hdbscan(data.tsne$Y, minPts = 150)
data.tsne.cluster## HDBSCAN clustering for 8719 objects.
## Parameters: minPts = 150
## The clustering contains 5 cluster(s) and 6230 noise points.
##
## 0 1 2 3 4 5
## 6230 218 443 160 1510 158
##
## Available fields: cluster, minPts, cluster_scores, membership_prob,
## outlier_scores, hc
Now that we have the labels let's see how well the points are separated.
data.tsne.labelled <- as_tibble(data.tsne$Y) %>%
rename( x = V1, y = V2) %>%
mutate(label = as.factor(data.tsne.cluster$cluster))
data.tsne.labelled %>%
ggplot(aes(x = x, y = y , col = label)) +
geom_point()Let's profile the clusters made by TNSE, to see if the groups are really different.
# Let's take a cluster and see how the dots perform
# data.tsne.labelled <- data.scaled %>%
# mutate(label = data.tsne.cluster$cluster) %>%
# relocate(label, .before = x1)
#
# # Let's take a cluster and see how the dots perform
# temp <- data.tsne.labelled[1,] %>%
# group_by(label) %>%
# apply()
#
# drawDynocard(as_tibble(temp), T)As a comparison, we'll cluster the points using the UMAP algorithm.
# Clustering and plotting
data.umap <- umap(data.scaled)
plot(data.umap$layout, type = 'points')The UMAP results in finer clusters with a lot more separation.
data.umap.cluster <- hdbscan(data.umap$layout, minPts = 100)
data.umap.cluster## HDBSCAN clustering for 8719 objects.
## Parameters: minPts = 100
## The clustering contains 35 cluster(s) and 1776 noise points.
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 1776 107 119 206 201 119 167 200 291 131 218 137 262 112 154 444
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
## 325 114 153 137 105 219 221 316 487 171 116 124 177 204 192 238
## 32 33 34 35
## 124 232 189 231
##
## Available fields: cluster, minPts, cluster_scores, membership_prob,
## outlier_scores, hc
# Labeling data.sample with the assigned clusters
data.umap.labelled <- as_tibble(data.umap$layout) %>%
rename(x = V1, y = V2) %>%
mutate(label = as.factor(data.umap.cluster$cluster))
data.umap.labelled %>%
ggplot(aes(x = x, y = y, col = as.factor(label))) +
geom_jitter()Visually inspecting the clustering result, it seems that UMAP has failed to cluster the data. There is no well-defined clusters that can be seen.
To see if this is the case, let's profile the wells to see how effective this clustering approach is compared to the TSNE.
# data.umap.labelled <- data.scaled %>%
# mutate(label = as.factor(data.umap.cluster$cluster))
#
# # Let's take a cluster and see how the dots perform
# temp <- data.umap.labelled %>%
# filter(label == 23) %>%
# select_if(is.numeric) %>%
# map(mean)From the clustering
Reuters. 2019. Indonesia's Pertamina targets higher crude oil output in 2020. Accessed 26 Aug 2020 from https://www.reuters.com/article/indonesia-pertamina-idUSL4N2881BP↩
Sharaf, S. A. 2018. Beam Pump Dynamometer Card Prediction Using Artifical Neural Networks in Sustainability and Resilience Conference.↩
Sharaf, S. A., Bangert, P. Fardan, M., Khalil, A., Abubakr, M. and Ahmed M. 2019. Beam pump Dynamometer Card Classification Using Machine Learning in SPE Middle East Oil and Gas Show and Conference.↩