Se carga el conjunto de datos de arrendamientos de hidrocarburos del estado de Kansas, EE.UU., registrados por el Kansas Geological Survey.
ruta_csv <- "C:/Users/luisq/OneDrive/Desktop/ESTADISTICA/kansas.csv"
datos <- read_delim(ruta_csv, delim = ";", show_col_types = FALSE)
cat("Base de datos cargada correctamente.\n")
## Base de datos cargada correctamente.
cat("Total de registros (filas):", nrow(datos), "\n")
## Total de registros (filas): 104173
La variable TOWNSHIP representa la división norte-sur del sistema de agrimensura rectangular de Kansas. Se filtran únicamente valores enteros válidos en el rango 1–35.
x_raw <- datos %>%
mutate(TWP = suppressWarnings(as.integer(TOWNSHIP))) %>%
filter(!is.na(TWP), TWP >= 1, TWP <= 35) %>%
pull(TWP)
n_unique <- length(unique(x_raw))
cat("Observaciones válidas:", length(x_raw), "\n")
## Observaciones válidas: 97708
cat("Valores únicos:", n_unique, "\n")
## Valores únicos: 35
cat("Como hay", n_unique, "> 10 valores únicos, se agrupa en intervalos de clase.\n")
## Como hay 35 > 10 valores únicos, se agrupa en intervalos de clase.
| Criterio | Clasificación |
|---|---|
| Tipo | Cuantitativa Continua (agrupada — más de 10 valores únicos) |
| Escala | De razón |
| Variable | TOWNSHIP (división norte-sur, sistema de agrimensura) |
| Rango | 1 a 35 |
| Fuente | Kansas Geological Survey – Kansas, EE.UU. |
Justificación: Aunque el township toma valores enteros, presenta 35 valores únicos, superando el umbral de 10. Por convención estadística se agrupa en intervalos de clase y se trata como variable cuantitativa continua. La escala es de razón, ya que el cero tiene significado absoluto.
\[k = 1 + 3{,}322 \cdot \log_{10}(n) \qquad c = \left\lceil \frac{\text{Rango}}{k} \right\rceil\]
x <- x_raw
n <- length(x)
k <- ceiling(1 + 3.322 * log10(n))
rango <- max(x) - min(x)
c_amp <- ceiling(rango / k)
cat("n =", n, "| k =", k, "| Rango =", rango, "| Amplitud c =", c_amp, "\n")
## n = 97708 | k = 18 | Rango = 34 | Amplitud c = 2
lim_inf <- min(x) + (0:(k - 1)) * c_amp
lim_sup <- lim_inf + c_amp
lim_sup[k] <- max(x) + 1 # cierra el último intervalo
mc <- (lim_inf + lim_sup - 1) / 2
Se construye la tabla de la variable cuantitativa continua Township, correspondiente a los arrendamientos de hidrocarburos registrados en Kansas, EE.UU., durante el período histórico disponible (n = 97,708).
breaks_vec <- c(lim_inf, lim_sup[k])
intervalos <- cut(x, breaks = breaks_vec, right = FALSE, include.lowest = TRUE)
freq_abs <- as.integer(table(intervalos))
li <- lim_inf
ls_real <- lim_sup - 1
hi_dec <- freq_abs / n
Ni_asc <- cumsum(freq_abs)
Hi_asc <- cumsum(hi_dec)
Ni_desc <- n - c(0, head(Ni_asc, -1))
Hi_desc <- 1 - c(0, head(Hi_asc, -1))
etiq_li <- paste0("[", li)
etiq_ls <- paste0(ls_real, ")")
etiq_ls[k] <- paste0(max(x), "]")
tabla_df <- data.frame(
Li = etiq_li,
Ls = etiq_ls,
MC = round(mc, 1),
ni = freq_abs,
hi_pct = sprintf("%.2f%%", hi_dec * 100),
hi_real = sprintf("%.4f", hi_dec),
Ni_a = Ni_asc,
Hi_a = sprintf("%.4f", Hi_asc),
Ni_d = Ni_desc,
Hi_d = sprintf("%.4f", Hi_desc),
stringsAsFactors = FALSE
)
total_row <- data.frame(
Li = "TOTAL", Ls = "—", MC = NA, ni = n,
hi_pct = "100.00%", hi_real = "1.0000",
Ni_a = NA, Hi_a = "—", Ni_d = NA, Hi_d = "—",
stringsAsFactors = FALSE
)
tabla_final <- bind_rows(tabla_df, total_row) %>%
mutate(
MC = ifelse(is.na(MC), "—", as.character(MC)),
Ni_a = ifelse(is.na(Ni_a), "—", as.character(Ni_a)),
Ni_d = ifelse(is.na(Ni_d), "—", as.character(Ni_d))
)
kable(
tabla_final,
caption = paste0(
"Tabla N°1: Distribución de Frecuencias de la Variable Cuantitativa Continua Township, ",
"registrada en los arrendamientos de hidrocarburos del estado de Kansas, EE.UU., ",
"período histórico disponible (n = ", format(n, big.mark = ","), " registros válidos)."
),
col.names = c(
"L. Inferior [", "L. Superior )",
"Marca de Clase",
"ni (FA)", "hi %", "hi (decimal)",
"Ni ↑ (FAAa)", "Hi ↑ (FRAa)",
"Ni ↓ (FAAd)", "Hi ↓ (FRAd)"
),
align = rep("c", 10)
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "bordered"),
full_width = TRUE, font_size = 12
) %>%
add_header_above(c("Intervalo" = 2, " " = 1, " " = 1,
"hi" = 2, "Acumulada ↑" = 2, "Acumulada ↓" = 2)) %>%
row_spec(0, bold = TRUE, background = "#d3d3d3", color = "black") %>%
row_spec(nrow(tabla_final), bold = TRUE, background = "#a9a9a9", color = "black")
| L. Inferior [ | L. Superior ) | Marca de Clase | ni (FA) | hi % | hi (decimal) | Ni ↑ (FAAa) | Hi ↑ (FRAa) | Ni ↓ (FAAd) | Hi ↓ (FRAd) |
|---|---|---|---|---|---|---|---|---|---|
| [1 |
|
1.5 | 1400 | 1.43% | 0.0143 | 1400 | 0.0143 | 97708 | 1.0000 |
| [3 |
|
3.5 | 1430 | 1.46% | 0.0146 | 2830 | 0.0290 | 96308 | 0.9857 |
| [5 |
|
5.5 | 1232 | 1.26% | 0.0126 | 4062 | 0.0416 | 94878 | 0.9710 |
| [7 |
|
7.5 | 1971 | 2.02% | 0.0202 | 6033 | 0.0617 | 93646 | 0.9584 |
| [9 |
|
9.5 | 2934 | 3.00% | 0.0300 | 8967 | 0.0918 | 91675 | 0.9383 |
| [11 |
|
11.5 | 2723 | 2.79% | 0.0279 | 11690 | 0.1196 | 88741 | 0.9082 |
| [13 |
|
13.5 | 5297 | 5.42% | 0.0542 | 16987 | 0.1739 | 86018 | 0.8804 |
| [15 |
|
15.5 | 5529 | 5.66% | 0.0566 | 22516 | 0.2304 | 80721 | 0.8261 |
| [17 |
|
17.5 | 5723 | 5.86% | 0.0586 | 28239 | 0.2890 | 75192 | 0.7696 |
| [19 |
|
19.5 | 6229 | 6.38% | 0.0638 | 34468 | 0.3528 | 69469 | 0.7110 |
| [21 |
|
21.5 | 5288 | 5.41% | 0.0541 | 39756 | 0.4069 | 63240 | 0.6472 |
| [23 |
|
23.5 | 6333 | 6.48% | 0.0648 | 46089 | 0.4717 | 57952 | 0.5931 |
| [25 |
|
25.5 | 6586 | 6.74% | 0.0674 | 52675 | 0.5391 | 51619 | 0.5283 |
| [27 |
|
27.5 | 7542 | 7.72% | 0.0772 | 60217 | 0.6163 | 45033 | 0.4609 |
| [29 |
|
29.5 | 10348 | 10.59% | 0.1059 | 70565 | 0.7222 | 37491 | 0.3837 |
| [31 |
|
31.5 | 10900 | 11.16% | 0.1116 | 81465 | 0.8338 | 27143 | 0.2778 |
| [33 |
|
33.5 | 13056 | 13.36% | 0.1336 | 94521 | 0.9674 | 16243 | 0.1662 |
| [35 ] | 35 | 3187 | 3.26% | 0.0326 | 97708 | 1.0000 | 3187 | 0.0326 | |
| TOTAL | — | — | 97708 | 100.00% | 1.0000 | — | — | — | — |
n_x <- length(x)
media <- mean(x)
mediana <- median(x)
idx_moda <- which.max(freq_abs)
moda_mc <- mc[idx_moda]
varianza <- var(x)
desv_std <- sd(x)
cv <- (desv_std / media) * 100
rango_val <- max(x) - min(x)
q1 <- as.numeric(quantile(x, 0.25))
q3 <- as.numeric(quantile(x, 0.75))
iqr_val <- IQR(x)
asimetria <- (3 * (media - mediana)) / desv_std
curtosis_val <- (sum((x - media)^4) / n_x) / (desv_std^4)
indicadores <- data.frame(
Indicador = c(
"Tamaño muestral (n)", "Mínimo", "Máximo", "Rango",
"Media", "Mediana", "Moda (marca de clase modal)",
"Varianza (s²)", "Desviación estándar (s)", "Coef. de variación (CV%)",
"Cuartil 1 (Q1)", "Cuartil 3 (Q3)", "Rango intercuartílico (IQR)",
"Asimetría de Pearson", "Curtosis"
),
Valor = c(
format(n_x, big.mark = ","), min(x), max(x), rango_val,
round(media, 4), mediana, round(moda_mc, 2),
round(varianza, 4), round(desv_std, 4), paste0(round(cv, 2), "%"),
q1, q3, iqr_val,
round(asimetria, 4), round(curtosis_val, 4)
),
stringsAsFactors = FALSE
)
kable(
indicadores,
caption = "Tabla N°2: Indicadores Estadísticos de la Variable Township, arrendamientos de hidrocarburos, Kansas, EE.UU.",
col.names = c("Indicador", "Valor"),
align = c("l", "c")
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "bordered"),
full_width = FALSE, font_size = 12
) %>%
row_spec(0, bold = TRUE, background = "#d3d3d3", color = "black")
| Indicador | Valor |
|---|---|
| Tamaño muestral (n) | 97,708 |
| Mínimo | 1 |
| Máximo | 35 |
| Rango | 34 |
| Media | 23.5815 |
| Mediana | 25 |
| Moda (marca de clase modal) | 33.5 |
| Varianza (s²) | 74.6058 |
| Desviación estándar (s) | 8.6375 |
| Coef. de variación (CV%) | 36.63% |
| Cuartil 1 (Q1) | 17 |
| Cuartil 3 (Q3) | 31 |
| Rango intercuartílico (IQR) | 14 |
| Asimetría de Pearson | -0.4927 |
| Curtosis | 2.4578 |
La variable Township indica la posición norte-sur de cada arrendamiento en Kansas (1 = extremo norte, 35 = extremo sur). Con base en 97,708 registros válidos agrupados en 18 intervalos de amplitud 2:
grises <- gray(seq(0.25, 0.80, length.out = k))
par(mar = c(5, 6, 6, 2))
h_obj <- hist(x, breaks = breaks_vec, plot = FALSE)
plot(h_obj,
col = grises, border = "black",
freq = TRUE, main = "", xlab = "", ylab = "", las = 1)
mtext("Frecuencia Absoluta (ni)", side = 2, line = 4.5, cex = 1)
mtext("Township", side = 1, line = 3.5, cex = 1)
mtext(
"Gráfica N°1: Histograma de Frecuencias Absolutas de la Variable Township,\narrendamientos de hidrocarburos, Kansas, EE.UU.",
side = 3, line = 3, cex = 0.9, font = 2
)
mc_ext <- c(mc[1] - c_amp, mc, mc[k] + c_amp)
ni_ext <- c(0, freq_abs, 0)
par(mar = c(5, 6, 6, 2))
plot(mc_ext, ni_ext,
type = "n", xlab = "", ylab = "", main = "",
ylim = c(0, max(ni_ext) * 1.12), las = 1)
polygon(c(mc_ext[1], mc_ext, tail(mc_ext,1)),
c(0, ni_ext, 0), col = "gray80", border = NA)
lines(mc_ext, ni_ext, col = "black", lwd = 2)
points(mc_ext, ni_ext, pch = 16, col = "black", cex = 0.9)
mtext("Frecuencia Absoluta (ni)", side = 2, line = 4.5, cex = 1)
mtext("Marca de Clase (Township)", side = 1, line = 3.5, cex = 1)
mtext(
"Gráfica N°2: Polígono de Frecuencias de la Variable Township,\narrendamientos de hidrocarburos, Kansas, EE.UU.",
side = 3, line = 3, cex = 0.9, font = 2
)
par(mar = c(5, 4, 6, 2))
boxplot(x,
col = "gray75", border = "black",
horizontal = TRUE, outline = TRUE, pch = 16, cex = 0.5,
main = "", xlab = "", ylab = "")
mtext("Township", side = 1, line = 3.5, cex = 1)
mtext(
"Gráfica N°3: Boxplot de la Variable Township,\narrendamientos de hidrocarburos, Kansas, EE.UU.",
side = 3, line = 3, cex = 0.9, font = 2
)
text(q1, 1.38, labels = paste0("Q1=", q1), cex = 0.8)
text(mediana, 0.62, labels = paste0("Me=", mediana), cex = 0.8)
text(q3, 1.38, labels = paste0("Q3=", q3), cex = 0.8)
x_asc <- c(li[1], ls_real)
y_asc <- c(0, Ni_asc)
par(mar = c(5, 7, 6, 2))
plot(x_asc, y_asc,
type = "b", pch = 16, lwd = 2, col = "black",
ylim = c(0, n * 1.05), xlab = "", ylab = "", main = "", las = 1)
grid(col = "gray85", lty = "dotted")
mtext("Frec. Absoluta Acumulada Creciente Ni ↑", side = 2, line = 5, cex = 0.9)
mtext("Township", side = 1, line = 3.5, cex = 1)
mtext(
"Gráfica N°4: Ojiva Creciente de la Variable Township,\narrendamientos de hidrocarburos, Kansas, EE.UU.",
side = 3, line = 3, cex = 0.9, font = 2
)
x_desc <- c(li[1], ls_real)
y_desc <- c(n, Ni_desc)
par(mar = c(5, 7, 6, 2))
plot(x_desc, y_desc,
type = "b", pch = 16, lwd = 2, col = "black",
ylim = c(0, n * 1.05), xlab = "", ylab = "", main = "", las = 1)
grid(col = "gray85", lty = "dotted")
mtext("Frec. Absoluta Acumulada Decreciente Ni ↓", side = 2, line = 5, cex = 0.9)
mtext("Township", side = 1, line = 3.5, cex = 1)
mtext(
"Gráfica N°5: Ojiva Decreciente de la Variable Township,\narrendamientos de hidrocarburos, Kansas, EE.UU.",
side = 3, line = 3, cex = 0.9, font = 2
)
Autor: Leslye Quinchiguango — Análisis Estadístico, Kansas Hydrocarbon Leases Dataset