El presente informe analiza la variable Elevación
(Elevation) del terreno donde se ubican los pozos de petróleo y
gas del estado de Nueva York, una variable cuantitativa
continua medida en pies (ft) sobre el nivel del mar. Se aplican
técnicas de estadística descriptiva: tabla de distribución de
frecuencias por intervalos (regla de Sturges), análisis
gráfico (histogramas, polígonos porcentuales, diagrama de caja y ojivas)
y el cálculo de las principales medidas de tendencia central, dispersión
y forma. Se excluyen los pozos con elevación 0 o sin dato
registrado.
library(readxl)
library(dplyr)
library(gt)
library(e1071)
col_principal <- "#0E6655"
col_barras <- "#16A085"
col_acento <- "#E67E22"
col_claro <- "#E8F8F5"
col_grid <- "#D7DBDD"
setwd("C:/Users/PATRICIA/Desktop/excel")
Datos <- read.csv("C:/Users/PATRICIA/Desktop/excel/oil_dataset.csv",
header = TRUE, sep = ";", dec = ",")
Variable <- suppressWarnings(as.numeric(Datos$Elevation..ft))
Variable <- na.omit(Variable)
Variable <- Variable[Variable > 0]
if (length(Variable) == 0) stop("ERROR: No hay datos validos en la variable.")
N <- length(Variable)
min_val <- min(Variable)
max_val <- max(Variable)
Rango <- max_val - min_val
K <- floor(1 + 3.322 * log10(N))
Amplitud <- Rango / K
breaks_table <- seq(min_val, max_val, length.out = K + 1)
breaks_table[length(breaks_table)] <- max_val + 0.0001
lim_inf_table <- breaks_table[1:K]
lim_sup_table <- breaks_table[2:(K + 1)]
MC <- (lim_inf_table + lim_sup_table) / 2
ni <- numeric(K)
for (i in 1:K) {
if (i < K) {
ni[i] <- length(Variable[Variable >= lim_inf_table[i] & Variable < lim_sup_table[i]])
} else {
ni[i] <- length(Variable[Variable >= lim_inf_table[i] & Variable <= lim_sup_table[i]])
}
}
hi <- (ni / sum(ni)) * 100
Ni_asc <- cumsum(ni)
Ni_desc <- rev(cumsum(rev(ni)))
Hi_asc <- cumsum(hi)
Hi_desc <- rev(cumsum(rev(hi)))
TDF_Elev <- data.frame(
Li = round(lim_inf_table, 1),
Ls = round(lim_sup_table, 1),
MC = round(MC, 1),
ni = ni,
hi = round(hi, 2),
Ni_asc = Ni_asc,
Ni_desc = Ni_desc,
Hi_asc = round(Hi_asc, 2),
Hi_desc = round(Hi_desc, 2)
)
cat("N (pozos con elevacion valida):", N, "| Clases (Sturges):", K,
"| Amplitud:", round(Amplitud, 1), "ft\n")## N (pozos con elevacion valida): 25548 | Clases (Sturges): 15 | Amplitud: 233.1 ft
A continuación se presenta la tabla de distribución de frecuencias por intervalos de elevación. Se resalta en naranja la clase modal (la de mayor frecuencia) y en verde la fila de totales.
totales <- c("TOTAL", "-", "-", sum(ni), round(sum(hi), 2), "-", "-", "-", "-")
TDF_Char <- TDF_Elev %>% mutate(across(everything(), as.character))
TDF_Final <- rbind(TDF_Char, totales)
modal_row <- which.max(TDF_Elev$ni)
TDF_Final %>%
gt() %>%
tab_header(
title = md("**DISTRIBUCIÓN DE FRECUENCIAS — ELEVACIÓN DEL TERRENO**"),
subtitle = md("Variable: **Elevación del pozo (ft s.n.m.)** · Pozos de Nueva York")
) %>%
tab_spanner(label = "Frecuencias acumuladas",
columns = c(Ni_asc, Ni_desc, Hi_asc, Hi_desc)) %>%
cols_label(
Li = "Lím. Inf", Ls = "Lím. Sup", MC = "Marca de clase (Xi)",
ni = "ni", hi = "hi (%)",
Ni_asc = "Ni (Asc)", Ni_desc = "Ni (Desc)",
Hi_asc = "Hi (Asc)", Hi_desc = "Hi (Desc)"
) %>%
cols_align(align = "center", columns = everything()) %>%
tab_style(
style = list(cell_fill(color = col_principal), cell_text(color = "white", weight = "bold")),
locations = cells_title()
) %>%
tab_style(
style = list(cell_fill(color = "#148F77"), cell_text(color = "white", weight = "bold")),
locations = cells_column_labels()
) %>%
tab_style(
style = list(cell_fill(color = "#148F77"), cell_text(color = "white", weight = "bold")),
locations = cells_column_spanners()
) %>%
tab_style(
style = list(cell_fill(color = "#FDEBD0"), cell_text(weight = "bold")),
locations = cells_body(rows = modal_row)
) %>%
tab_style(
style = list(cell_fill(color = "#D0ECE7"), cell_text(weight = "bold")),
locations = cells_body(rows = nrow(TDF_Final))
) %>%
opt_row_striping() %>%
opt_table_font(font = google_font("Roboto")) %>%
tab_options(
table.font.size = px(13),
heading.align = "left",
heading.title.font.size = px(17),
data_row.padding = px(7),
table.border.top.color = col_principal,
table.border.bottom.color = col_principal,
column_labels.border.bottom.color = col_principal
) %>%
tab_source_note(md("*Fuente: NYS DEC — Oil, Gas & Other Regulated Wells. Elaboración: JENNY.*"))| DISTRIBUCIÓN DE FRECUENCIAS — ELEVACIÓN DEL TERRENO | ||||||||
| Variable: Elevación del pozo (ft s.n.m.) · Pozos de Nueva York | ||||||||
| Lím. Inf | Lím. Sup | Marca de clase (Xi) | ni | hi (%) |
Frecuencias acumuladas
|
|||
|---|---|---|---|---|---|---|---|---|
| Ni (Asc) | Ni (Desc) | Hi (Asc) | Hi (Desc) | |||||
| 3 | 236.1 | 119.6 | 303 | 1.19 | 303 | 25548 | 1.19 | 100 |
| 236.1 | 469.3 | 352.7 | 464 | 1.82 | 767 | 25245 | 3 | 98.81 |
| 469.3 | 702.4 | 585.8 | 2087 | 8.17 | 2854 | 24781 | 11.17 | 97 |
| 702.4 | 935.5 | 819 | 2992 | 11.71 | 5846 | 22694 | 22.88 | 88.83 |
| 935.5 | 1168.7 | 1052.1 | 2159 | 8.45 | 8005 | 19702 | 31.33 | 77.12 |
| 1168.7 | 1401.8 | 1285.2 | 3095 | 12.11 | 11100 | 17543 | 43.45 | 68.67 |
| 1401.8 | 1634.9 | 1518.4 | 4704 | 18.41 | 15804 | 14448 | 61.86 | 56.55 |
| 1634.9 | 1868.1 | 1751.5 | 3684 | 14.42 | 19488 | 9744 | 76.28 | 38.14 |
| 1868.1 | 2101.2 | 1984.6 | 2830 | 11.08 | 22318 | 6060 | 87.36 | 23.72 |
| 2101.2 | 2334.3 | 2217.8 | 2966 | 11.61 | 25284 | 3230 | 98.97 | 12.64 |
| 2334.3 | 2567.5 | 2450.9 | 254 | 0.99 | 25538 | 264 | 99.96 | 1.03 |
| 2567.5 | 2800.6 | 2684 | 7 | 0.03 | 25545 | 10 | 99.99 | 0.04 |
| 2800.6 | 3033.7 | 2917.2 | 2 | 0.01 | 25547 | 3 | 100 | 0.01 |
| 3033.7 | 3266.9 | 3150.3 | 0 | 0 | 25547 | 1 | 100 | 0 |
| 3266.9 | 3500 | 3383.4 | 1 | 0 | 25548 | 1 | 100 | 0 |
| TOTAL | - | - | 25548 | 100 | - | - | - | - |
| Fuente: NYS DEC — Oil, Gas & Other Regulated Wells. Elaboración: JENNY. | ||||||||
Esta sección presenta la visualización de la distribución de la elevación de los pozos.
par(mar = c(8, 5, 4, 2))
plot(h_base,
main = "Gráfica N°1: Distribución de la elevación del terreno de los pozos (NY)",
xlab = "Elevación (ft s.n.m.)", ylab = "Frecuencia absoluta (n° de pozos)",
col = col_barras, border = "white", axes = FALSE, cex.main = 0.95,
ylim = c(0, max(h_base$counts) * 1.1))
axis(1, at = round(h_base$breaks), labels = format(round(h_base$breaks), scientific = FALSE),
las = 2, cex.axis = 0.7)
axis(2)
grid(nx = NA, ny = NULL, col = col_grid, lty = "dotted")par(mar = c(8, 5, 4, 2))
plot(h_base,
main = "Gráfica N°2: Distribución global de la elevación",
xlab = "Elevación (ft s.n.m.)", ylab = "N° de pozos",
col = col_barras, border = "white", axes = FALSE, cex.main = 0.95,
ylim = c(0, sum(h_base$counts)))
axis(1, at = round(h_base$breaks), labels = format(round(h_base$breaks), scientific = FALSE),
las = 2, cex.axis = 0.7)
axis(2)
grid(nx = NA, ny = NULL, col = col_grid, lty = "dotted")h_porc <- h_base
h_porc$counts <- (h_porc$counts / sum(h_porc$counts)) * 100
h_porc$density <- h_porc$counts
y_max <- ceiling(max(h_porc$counts) / 5)*5
par(mar = c(8, 5, 4, 2))
plot(h_porc,
main = "Gráfica N°3: Distribución porcentual de la elevación",
xlab = "Elevación (ft s.n.m.)", ylab = "Porcentaje (%)",
col = col_barras, border = "white", axes = FALSE, freq = TRUE, cex.main = 0.95,
ylim = c(0, y_max))
axis(1, at = round(h_base$breaks), labels = format(round(h_base$breaks), scientific = FALSE),
las = 2, cex.axis = 0.7)
axis(2)
text(x = h_base$mids, y = h_porc$counts, labels = paste0(round(h_porc$counts, 1), "%"),
pos = 3, cex = 0.6, col = col_principal)
grid(nx = NA, ny = NULL, col = col_grid, lty = "dotted")par(mar = c(8, 5, 4, 2))
plot(h_porc,
main = "Gráfica N°4: Distribución porcentual global (elevación)",
xlab = "Elevación (ft s.n.m.)", ylab = "% del total",
col = col_barras, border = "white", axes = FALSE, freq = TRUE, cex.main = 0.95,
ylim = c(0, 100))
axis(1, at = round(h_base$breaks), labels = format(round(h_base$breaks), scientific = FALSE),
las = 2, cex.axis = 0.7)
axis(2)
text(x = h_base$mids, y = h_porc$counts, labels = paste0(round(h_porc$counts, 1), "%"),
pos = 3, cex = 0.6, col = col_principal)
abline(h = seq(0, 100, 20), col = col_grid, lty = "dotted")par(mar = c(5, 5, 4, 2))
boxplot(Variable, horizontal = TRUE, col = col_barras,
main = "Gráfica N°5: Diagrama de caja de la elevación",
xlab = "Elevación (ft s.n.m.)", outline = TRUE, outpch = 19,
outcol = col_acento, boxwex = 0.5, frame.plot = FALSE, xaxt = "n")
eje_x <- pretty(Variable, n = 15)
axis(1, at = eje_x, labels = format(eje_x, scientific = FALSE), cex.axis = 0.7, las = 2)
grid(nx = NULL, ny = NA, col = col_grid, lty = "dotted")par(mar = c(5, 5, 4, 8), xpd = TRUE)
x_ac <- breaks_table
y_asc <- c(0, Ni_asc)
y_des <- c(Ni_desc, 0)
x_range <- range(x_ac)
y_range <- c(0, max(c(y_asc, y_des)))
plot(x_ac, y_asc, type = "o", col = col_principal, lwd = 2, pch = 19,
main = "Gráfica N°6: Ojivas de la elevación",
xlab = "Elevación (ft s.n.m.)", ylab = "Frecuencia acumulada (n° de pozos)",
xlim = x_range, ylim = y_range, axes = FALSE, frame.plot = FALSE, cex.main = 0.95)
axis(1, at = round(breaks_table), labels = format(round(breaks_table), scientific = FALSE),
las = 2, cex.axis = 0.6)
axis(2, at = pretty(y_asc), labels = format(pretty(y_asc), scientific = FALSE))
lines(x_ac, y_des, type = "o", col = col_acento, lwd = 2, pch = 19)
legend("right", legend = c("Ascendente (Ni↑)", "Descendente (Ni↓)"),
col = c(col_principal, col_acento), lty = 1, pch = 19, cex = 0.75, lwd = 2,
inset = c(-0.18, 0), bty = "n")
grid(col = col_grid, lty = "dotted")media_val <- mean(Variable)
mediana_val <- median(Variable)
freq_max <- max(TDF_Elev$ni)
modas_calc <- TDF_Elev$MC[TDF_Elev$ni == freq_max]
moda_txt <- paste(round(modas_calc, 1), collapse = ", ")
rango_txt <- paste0("[", round(min_val, 1), "; ", round(max_val, 1), "]")
varianza_val <- var(Variable)
sd_val <- sd(Variable)
cv_val <- (sd_val / abs(media_val)) * 100
asimetria_val <- skewness(Variable, type = 2)
curtosis_val <- kurtosis(Variable, type = 2)
vals_atipicos <- boxplot.stats(Variable)$out
num_atipicos <- length(vals_atipicos)
status_atipicos <- if (num_atipicos > 0) {
paste0(num_atipicos, " [", round(min(vals_atipicos), 1), "; ", round(max(vals_atipicos), 1), "]")
} else { "0 (Sin atipicos)" }
df_resumen <- data.frame(
"Variable" = "Elevacion (ft s.n.m.)",
"Rango" = rango_txt,
"Media" = media_val,
"Mediana" = mediana_val,
"Moda" = moda_txt,
"Varianza" = varianza_val,
"Desv_Std" = sd_val,
"CV_Porc" = cv_val,
"Asimetria" = asimetria_val,
"Curtosis" = curtosis_val,
"Atipicos" = status_atipicos,
check.names = FALSE
)
df_resumen %>%
gt() %>%
tab_header(
title = md("**RESUMEN ESTADÍSTICO Y MEDIDAS DESCRIPTIVAS**"),
subtitle = "Indicadores de la elevación del terreno de los pozos (ft) — Nueva York"
) %>%
tab_source_note(source_note = "Autor: DALLYANA") %>%
fmt_number(columns = c(Media, Mediana, Varianza, Desv_Std, CV_Porc, Curtosis), decimals = 2) %>%
fmt_number(columns = c(Asimetria), decimals = 4) %>%
cols_label(
Variable = "Variable", Rango = "Rango Total",
Media = "Media (X)", Mediana = "Mediana (Me)", Moda = "Moda (Mo)",
Varianza = "Varianza (S2)", Desv_Std = "Desv. Est. (S)", CV_Porc = "C.V. (%)",
Asimetria = "Asimetria (As)", Curtosis = "Curtosis (K)", Atipicos = "Outliers [Intervalo]"
) %>%
cols_align(align = "center", columns = everything()) %>%
tab_style(
style = list(cell_fill(color = col_principal), cell_text(color = "white", weight = "bold")),
locations = cells_title()
) %>%
tab_style(
style = list(cell_fill(color = "#148F77"), cell_text(color = "white", weight = "bold")),
locations = cells_column_labels()
) %>%
opt_table_font(font = google_font("Roboto")) %>%
tab_options(
table.font.size = px(13),
heading.align = "left",
data_row.padding = px(9),
table.border.top.color = col_principal,
table.border.bottom.color = col_principal,
column_labels.border.bottom.color = col_principal
)| RESUMEN ESTADÍSTICO Y MEDIDAS DESCRIPTIVAS | ||||||||||
| Indicadores de la elevación del terreno de los pozos (ft) — Nueva York | ||||||||||
| Variable | Rango Total | Media (X) | Mediana (Me) | Moda (Mo) | Varianza (S2) | Desv. Est. (S) | C.V. (%) | Asimetria (As) | Curtosis (K) | Outliers [Intervalo] |
|---|---|---|---|---|---|---|---|---|---|---|
| Elevacion (ft s.n.m.) | [3; 3500] | 1,432.98 | 1,495.00 | 1518.4 | 291,239.03 | 539.67 | 37.66 | −0.2732 | −0.73 | 1 [3500; 3500] |
| Autor: DALLYANA | ||||||||||
La variable Elevación fluctúa entre 3 y 3500 pies sobre el nivel del mar, y sus valores se concentran alrededor de 1433 ft (mediana ≈ 1495 ft). Presenta una desviación estándar de 539.7 ft, siendo una variable heterogénea (C.V. = 37.66%), cuyos valores se concentran en la parte alta (terrenos de mayor elevación) de la distribución, con la presencia de 1 valor(es) atípico(s).
La distribución muestra una distribución aproximadamente simétrica: la mayoría de los pozos de Nueva York se ubican en terrenos de elevación media-alta (en torno a los 1495 ft sobre el nivel del mar), característicos de la meseta de Allegheny del oeste del estado (Southern Tier), donde se concentra la actividad petrolera y gasífera. La media (1433 ft) es menor que la mediana (1495 ft), lo que confirma el comportamiento casi simétrico de la variable.