Inicialmente se realiza una compresión de la base de datos de 10.000 transacciones de agua a lo largo de Chile, el análisis conglomerará las transacciones por cuencas en un período de años desde 1993 hasta 2014.
Se trabajará con una base de datos de la SISS (Superintendencia de Servicios Sanitarios) la que comprende un total de 15.000 transacciones de derechos de agua registradas entre los años 1993 y 2014. Los datos disponibles abarcan una amplia gama de diversidad geográfica y transaccional. Incluye información de transacciones en 12 de las 16 regiones de Chile, cubriendo 150 comunas y 26 cuencas hidrográficas, representando una parte sustancial de la variabilidad climática de Chile, aunque excluye las zonas climáticas extremas como el Desierto de Atacama, los Campos de Hielo Sur y la Estepa Patagónica. Esta base de datos será sometida a un proceso de validación y limpieza de datos, buscando el máximo número de entradas válidas. Cada transacción será cuantificada en términos de tasas de flujo (medidas en litros por segundo (L/s)), con su valor monetario. Los valores monetarios de las transacciones serán transformados a la Unidad de Fomento (UF), estandarizando los valores y transformándolos a precio unitario (UF por 1 L/s), con base en la fecha de la transacción, esto considera que la UF es una unidad de cuenta usada en Chile que se ajusta constantemente por inflación, asegurando un análisis económico consistente y comparable. Por otra parte, las transacciones en el conjunto de datos involucran varios tipos de partes, incluyendo personas naturales, jurídicas y combinaciones de estas, reflejando una amplia dispersión de los valores y los montos de las transacciones.
datos_general <- read.csv("D://Tesis//Datos//Data_Final_240116v2.csv", sep=";", dec=".")
cuencas_centroides <- read.csv("D://Tesis//Datos//Cuencas_coord.csv", sep=";", dec=".")
library(dplyr)
library(tidyr)
library(kableExtra)
library(ggplot2)
library(gridExtra)
La base solo considerará la cuenca, el monto de transacción y el año en el cual fue realizada, por lo que se creará un marco de datos aparte para este efecto.
data <- datos_general %>% select("Año","PrecioUF","Cuenca")
summary(data)
## Año PrecioUF Cuenca
## Min. :1993 Min. : 0.0 Min. : 13.00
## 1st Qu.:2010 1st Qu.: 53.4 1st Qu.: 45.00
## Median :2011 Median : 158.3 Median : 54.00
## Mean :2011 Mean : 854.2 Mean : 52.18
## 3rd Qu.:2013 3rd Qu.: 436.0 3rd Qu.: 57.00
## Max. :2014 Max. :367535.6 Max. :109.00
el precio de las transacciones por año se distribuye como se muestra en el siguiente gráfico
# función para imprimir tablas con formato
print_table <- function(data, title) {
knitr::asis_output(
paste0("## ", title, "\n\n",
knitr::kable(data, format = "html", table.attr = "style='width:100%;'") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>%
scroll_box(width = "100%", height = "400px")
)
)
}
Conteo de transacciones por año por cuenca
data_count <- data %>%
group_by(Cuenca, Año) %>%
summarise(ConteoTransacciones = n()) %>%
spread(Año, ConteoTransacciones, fill = 0)
print_table(data_count, "Conteo de transacciones")
| Cuenca | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 13 | 6 | 13 | 17 | 0 | 0 | 0 |
| 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 21 | 4 | 2 | 13 | 8 | 7 | 7 | 9 | 0 | 0 | 0 |
| 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 15 | 15 | 26 | 25 |
| 41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 3 | 2 | 1 |
| 43 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 7 | 5 | 1 | 6 | 349 | 365 | 400 | 431 | 339 |
| 44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39 | 70 | 58 | 43 | 43 |
| 45 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 2 | 0 | 283 | 240 | 251 | 190 | 167 |
| 47 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 92 | 97 | 104 | 112 | 98 |
| 48 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 14 | 16 | 19 | 16 |
| 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 5 | 13 | 14 | 12 | 0 |
| 52 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 8 | 29 | 29 | 9 | 0 |
| 53 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 4 | 14 | 25 | 21 | 0 |
| 54 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 232 | 220 | 280 | 290 | 228 | 0 |
| 55 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 8 | 16 | 37 | 38 | 0 |
| 57 | 3 | 6 | 2 | 3 | 1 | 1 | 2 | 4 | 4 | 2 | 9 | 7 | 5 | 16 | 9 | 630 | 771 | 799 | 835 | 694 | 1 |
| 60 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 10 | 8 | 4 | 94 | 93 | 107 | 92 | 84 |
| 61 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 2 | 4 | 3 | 0 | 3 | 4 |
| 71 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 5 | 6 | 11 | 6 |
| 81 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 6 | 9 | 5 | 5 |
| 82 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 3 | 1 | 1 | 1 | 0 | 5 | 0 | 2 | 4 | 1 |
| 83 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 16 | 31 | 52 | 33 |
| 101 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 13 | 18 | 6 | 23 |
| 103 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 4 | 1 | 5 | 2 | 3 | 6 | 2 | 4 |
| 104 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 2 |
| 106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 0 |
| 109 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
Al existir pocas o nulas transacciones registradas se optó por mantener los datos de los años 2005 a 2014 para realizar las matrices siguientes, a su vez, se mantuvieron solo las cuencas que presentaban en total más de 30 transacciones entre los años mencionados para poder efectuar labores de imputación de datos faltantes.
data_count <- data %>%
filter(Cuenca %in% cuencas) %>% # Filtrar por cuencas
group_by(Cuenca, Año) %>%
summarise(ConteoTransacciones = n()) %>%
spread(Año, ConteoTransacciones, fill = 0) %>%
select(Cuenca, `2005`:`2014`)
print_table(data_count,"Media de las transacciones")
| Cuenca | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1 | 0 | 0 | 13 | 6 | 13 | 17 | 0 | 0 | 0 |
| 17 | 4 | 2 | 13 | 8 | 7 | 7 | 9 | 0 | 0 | 0 |
| 21 | 0 | 0 | 0 | 0 | 0 | 11 | 15 | 15 | 26 | 25 |
| 43 | 2 | 7 | 5 | 1 | 6 | 349 | 365 | 400 | 431 | 339 |
| 44 | 0 | 0 | 0 | 0 | 0 | 39 | 70 | 58 | 43 | 43 |
| 45 | 2 | 0 | 1 | 2 | 0 | 283 | 240 | 251 | 190 | 167 |
| 47 | 0 | 0 | 0 | 0 | 0 | 92 | 97 | 104 | 112 | 98 |
| 48 | 0 | 0 | 0 | 0 | 0 | 9 | 14 | 16 | 19 | 16 |
| 51 | 0 | 0 | 0 | 0 | 7 | 5 | 13 | 14 | 12 | 0 |
| 52 | 0 | 0 | 0 | 0 | 17 | 8 | 29 | 29 | 9 | 0 |
| 53 | 0 | 0 | 0 | 0 | 7 | 4 | 14 | 25 | 21 | 0 |
| 54 | 0 | 0 | 0 | 0 | 232 | 220 | 280 | 290 | 228 | 0 |
| 55 | 0 | 0 | 0 | 0 | 6 | 8 | 16 | 37 | 38 | 0 |
| 57 | 7 | 5 | 16 | 9 | 630 | 771 | 799 | 835 | 694 | 1 |
| 60 | 0 | 4 | 10 | 8 | 4 | 94 | 93 | 107 | 92 | 84 |
| 71 | 0 | 0 | 0 | 0 | 0 | 4 | 5 | 6 | 11 | 6 |
| 81 | 0 | 0 | 0 | 0 | 0 | 7 | 6 | 9 | 5 | 5 |
| 83 | 0 | 0 | 0 | 0 | 0 | 17 | 16 | 31 | 52 | 33 |
| 101 | 0 | 0 | 0 | 0 | 0 | 19 | 13 | 18 | 6 | 23 |
| 103 | 4 | 4 | 4 | 1 | 5 | 2 | 3 | 6 | 2 | 4 |
# 2. Dataframe con la media de las transacciones en UF
data_mean <- data %>%
filter(Cuenca %in% cuencas) %>% # Filtrar por cuencas
group_by(Cuenca, Año) %>%
summarise(MediaTransacciones = round(mean(PrecioUF, na.rm = TRUE), 2)) %>%
spread(Año, MediaTransacciones, fill = NA) %>%
select(Cuenca, `2005`:`2014`)
print_table(data_mean,"Media de las transacciones")
| Cuenca | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1.29 | NA | NA | 166.12 | 460.99 | 392.59 | 582.09 | NA | NA | NA |
| 17 | 4090.04 | 693.36 | 309.99 | 288.41 | 446.97 | 704.31 | 794.59 | NA | NA | NA |
| 21 | NA | NA | NA | NA | NA | 1829.60 | 2269.57 | 3541.83 | 2490.15 | 2782.23 |
| 43 | 59.67 | 832.17 | 62.04 | 52.71 | 188.12 | 191.84 | 206.33 | 243.01 | 246.57 | 272.25 |
| 44 | NA | NA | NA | NA | NA | 1025.84 | 795.37 | 1066.63 | 1032.39 | 2203.21 |
| 45 | 135.71 | NA | 9.93 | 26.91 | NA | 336.09 | 335.38 | 308.56 | 360.81 | 245.82 |
| 47 | NA | NA | NA | NA | NA | 286.40 | 531.53 | 910.00 | 929.63 | 695.88 |
| 48 | NA | NA | NA | NA | NA | 1387.85 | 7317.37 | 33615.64 | 6184.73 | 10587.06 |
| 51 | NA | NA | NA | NA | 126.96 | 473.96 | 105.13 | 101.83 | 121.06 | NA |
| 52 | NA | NA | NA | NA | 193.06 | 86.76 | 181.67 | 115.75 | 127.20 | NA |
| 53 | NA | NA | NA | NA | 5.79 | 104.05 | 1947.50 | 331.86 | 1481.62 | NA |
| 54 | NA | NA | NA | NA | 759.60 | 436.09 | 2149.14 | 1039.38 | 800.94 | NA |
| 55 | NA | NA | NA | NA | 467.97 | 654.90 | 300.41 | 796.18 | 1607.64 | NA |
| 57 | 106.93 | 1480.77 | 625.59 | 581.74 | 1736.28 | 749.13 | 1072.40 | 1292.00 | 1203.78 | 104.37 |
| 60 | NA | 86.91 | 156.13 | 328.26 | 108.70 | 109.24 | 100.50 | 142.38 | 117.61 | 170.38 |
| 71 | NA | NA | NA | NA | NA | 69.08 | 16.19 | 5.58 | 28.56 | 47.64 |
| 81 | NA | NA | NA | NA | NA | 26.85 | 5.72 | 12.34 | 11.68 | 53.37 |
| 83 | NA | NA | NA | NA | NA | 9.68 | 21.40 | 13.67 | 15.44 | 32.53 |
| 101 | NA | NA | NA | NA | NA | 49.24 | 199.91 | 162.59 | 404.21 | 321.66 |
| 103 | 26.36 | 26.45 | 22.27 | 3.60 | 1.45 | 36.27 | 11.85 | 11.23 | 24.99 | 62.75 |
# 3. Dataframe con la media geométrica de las transacciones en UF
data_geometric_mean <- data %>%
filter(Cuenca %in% cuencas) %>% # Filtrar por cuencas
group_by(Cuenca, Año) %>%
summarise(MediaGeometricaTransacciones = round(exp(mean(log(PrecioUF), na.rm = TRUE)), 2)) %>%
spread(Año, MediaGeometricaTransacciones, fill = NA) %>%
select(Cuenca, `2005`:`2014`)
print_table(data_geometric_mean, "Media geométrica de las transacciones")
| Cuenca | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1.29 | NA | NA | 36.15 | 154.18 | 309.78 | 315.11 | NA | NA | NA |
| 17 | 1090.39 | 550.47 | 164.10 | 124.26 | 232.71 | 363.26 | 401.03 | NA | NA | NA |
| 21 | NA | NA | NA | NA | NA | 1050.90 | 1501.18 | 1316.28 | 2264.85 | 2491.92 |
| 43 | 59.14 | 270.00 | 47.90 | 52.71 | 98.63 | 113.23 | 121.02 | 122.26 | 138.11 | 162.10 |
| 44 | NA | NA | NA | NA | NA | 452.66 | 443.19 | 499.39 | 568.93 | 694.33 |
| 45 | 128.50 | NA | 9.93 | 24.76 | NA | 189.76 | 167.95 | 169.68 | 209.86 | 137.38 |
| 47 | NA | NA | NA | NA | NA | 87.93 | 106.47 | 175.93 | 216.75 | 102.95 |
| 48 | NA | NA | NA | NA | NA | 229.74 | 1209.34 | 1534.88 | 1796.71 | 1576.79 |
| 51 | NA | NA | NA | NA | 111.35 | 210.02 | 73.01 | 71.03 | 84.34 | NA |
| 52 | NA | NA | NA | NA | 49.10 | 40.81 | 114.44 | 72.46 | 99.40 | NA |
| 53 | NA | NA | NA | NA | 4.07 | 61.71 | 426.00 | 30.84 | 190.77 | NA |
| 54 | NA | NA | NA | NA | 60.99 | 67.48 | 97.17 | 63.96 | 83.67 | NA |
| 55 | NA | NA | NA | NA | 153.67 | 294.55 | 112.79 | 485.62 | 598.73 | NA |
| 57 | 47.78 | 183.46 | 245.48 | 91.98 | 196.55 | 205.53 | 210.85 | 230.48 | 250.44 | 104.37 |
| 60 | NA | 67.63 | 55.61 | 21.12 | 46.89 | 26.74 | 43.32 | 40.12 | 40.02 | 50.11 |
| 71 | NA | NA | NA | NA | NA | 38.61 | 7.99 | 2.09 | 4.61 | 10.17 |
| 81 | NA | NA | NA | NA | NA | 6.16 | 2.51 | 2.73 | 4.33 | 7.55 |
| 83 | NA | NA | NA | NA | NA | 5.86 | 4.98 | 4.33 | 8.18 | 13.30 |
| 101 | NA | NA | NA | NA | NA | 4.47 | 50.35 | 31.06 | 64.09 | 103.91 |
| 103 | 5.07 | 2.79 | 10.48 | 3.60 | 0.92 | 24.72 | 2.96 | 2.02 | 24.96 | 35.16 |
# 4. Dataframe con la mediana de las transacciones en UF
data_median <- data %>%
filter(Cuenca %in% cuencas) %>% # Filtrar por cuencas
group_by(Cuenca, Año) %>%
summarise(MedianaTransacciones = round(median(PrecioUF, na.rm = TRUE), 2)) %>%
spread(Año, MedianaTransacciones, fill = NA) %>%
select(Cuenca, `2005`:`2014`)
print_table(data_median, "Mediana de las transacciones")
| Cuenca | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1.29 | NA | NA | 22.27 | 406.22 | 373.15 | 621.18 | NA | NA | NA |
| 17 | 1141.32 | 693.36 | 183.38 | 152.06 | 136.77 | 393.93 | 367.45 | NA | NA | NA |
| 21 | NA | NA | NA | NA | NA | 1481.32 | 2627.73 | 2083.60 | 2445.23 | 2401.49 |
| 43 | 59.67 | 1062.79 | 56.73 | 52.71 | 84.06 | 116.53 | 115.86 | 122.21 | 173.30 | 185.36 |
| 44 | NA | NA | NA | NA | NA | 468.20 | 457.77 | 442.86 | 437.59 | 500.00 |
| 45 | 135.71 | NA | 9.93 | 26.91 | NA | 222.04 | 181.76 | 176.55 | 234.23 | 170.56 |
| 47 | NA | NA | NA | NA | NA | 92.77 | 90.42 | 155.85 | 174.15 | 83.48 |
| 48 | NA | NA | NA | NA | NA | 99.74 | 2507.54 | 2679.60 | 4599.90 | 3751.74 |
| 51 | NA | NA | NA | NA | 118.80 | 118.41 | 100.19 | 88.71 | 120.15 | NA |
| 52 | NA | NA | NA | NA | 39.17 | 81.73 | 123.38 | 110.69 | 104.98 | NA |
| 53 | NA | NA | NA | NA | 6.52 | 108.03 | 1056.23 | 7.28 | 104.50 | NA |
| 54 | NA | NA | NA | NA | 58.09 | 62.58 | 72.23 | 61.22 | 76.84 | NA |
| 55 | NA | NA | NA | NA | 184.00 | 235.76 | 185.58 | 957.85 | 373.49 | NA |
| 57 | 84.13 | 111.41 | 510.47 | 37.52 | 240.35 | 237.25 | 250.00 | 222.44 | 239.95 | 104.37 |
| 60 | NA | 54.63 | 83.01 | 15.11 | 71.67 | 25.94 | 42.43 | 49.32 | 38.63 | 39.20 |
| 71 | NA | NA | NA | NA | NA | 41.37 | 9.14 | 1.91 | 7.29 | 16.61 |
| 81 | NA | NA | NA | NA | NA | 3.67 | 4.23 | 2.21 | 1.95 | 4.35 |
| 83 | NA | NA | NA | NA | NA | 5.07 | 3.08 | 4.17 | 10.06 | 9.69 |
| 101 | NA | NA | NA | NA | NA | 6.81 | 91.63 | 56.42 | 227.21 | 322.62 |
| 103 | 6.13 | 0.85 | 7.50 | 3.60 | 0.70 | 36.27 | 1.21 | 1.87 | 24.99 | 23.67 |
Las matrices muestran la falta de registros para algunos años, por lo que es necesario realizar algún tipo de imputación para poder estimar la relación con el clima mediante el uso de mayor cantidad de datos. Para este caso se realizará mediante la Imputación Múltiple por Ecuaciones Encadenadas (MICE) es una técnica robusta que puede ser efectiva para datos de panel. Esta técnica imputa los datos faltantes varias veces, creando múltiples conjuntos de datos completos. Cada conjunto se analiza por separado, y los resultados se combinan para producir estimaciones finales que tienen en cuenta la incertidumbre de la imputación.
Para realizar la imputación de datos, y considerando la dependencia espacial de las cuencas, se agregará al análisis la distancia entre centroides de cada cuenca, permitiendo visualizar los efectos espaciales entre ellas.
# Paquetes necesarios
library(geosphere)
library(sf)
# Crear un dataframe espacial
coordenadas <- cuencas_centroides %>%
select("COD_CUEN","X","Y") %>%
filter(COD_CUEN %in% cuencas) %>%
rename(Cuenca = COD_CUEN) %>%
mutate(
X = as.numeric(gsub(",",".",X)),
Y = as.numeric(gsub(",",".",Y))
)
str(coordenadas)
## 'data.frame': 20 obs. of 3 variables:
## $ Cuenca: int 13 17 21 43 44 45 47 48 51 52 ...
## $ X : num 414745 461280 493458 351644 269252 ...
## $ Y : num 7945469 7767155 7575721 6683981 6645414 ...
# Convertir el dataframe a un objeto sf
cuencas_sf <- st_as_sf(coordenadas, coords = c("X", "Y"), crs = 32719) # Cambia '32719' al EPSG correcto si es diferente
# Transformar las coordenadas de UTM a lat/long
cuencas_sf <- st_transform(cuencas_sf, crs = 4326)
# Extraer las coordenadas transformadas y agregarlas al dataframe original
cuencas_coords <- st_coordinates(cuencas_sf)
coordenadas$latitud <- cuencas_coords[, "Y"]
coordenadas$longitud <- cuencas_coords[, "X"]
# función de cálculo de distancia entre centroides
calc_dist <- function(lat1, lon1, lat2, lon2) {
dist <- distHaversine(c(lon1, lat1), c(lon2, lat2))
return(dist)
}
# matriz de distancias
dist_matrix <- outer(
1:nrow(coordenadas),
1:nrow(coordenadas),
Vectorize(function(i, j) calc_dist(
coordenadas$latitud[i], coordenadas$longitud[i],
coordenadas$latitud[j], coordenadas$longitud[j]
))
)
# Reemplazar 0 de diagonales con NA para evitar selección de cuenca consigo misma
diag(dist_matrix) <- NA
# encontrar la distancia mínima
dist_df_min <- data.frame(
Cuenca = coordenadas$Cuenca,
min_dist = apply(dist_matrix, 1, function(x) min(x, na.rm = TRUE)),
Cuenca_min_dist = apply(dist_matrix, 1, function(x) coordenadas$Cuenca[which.min(x)])
)
# print(dist_df_min)
library(mice)
library(purrr)
library(caret)
# Renombrar las columnas para evitar caracteres no permitidos
rename_columns <- function(df, prefix) {
colnames(df)[-1] <- paste(prefix, colnames(df)[-1], sep = "_")
return(df)
}
data_count <- rename_columns(data_count, "count")
data_mean <- rename_columns(data_mean, "mean")
data_geometric_mean <- rename_columns(data_geometric_mean, "geom")
data_median <- rename_columns(data_median, "median")
# Combinar las matrices usando reduce del paquete purrr
data_combined <- reduce(list(data_count, data_mean, data_geometric_mean, data_median), function(x, y) merge(x, y, by="Cuenca", all=TRUE))
# Agregar datos espaciales a matrices combinadas
data_combined <- left_join(data_combined, dist_df_min, by = "Cuenca")
print(data_combined)
## Cuenca count_2005 count_2006 count_2007 count_2008 count_2009 count_2010
## 1 13 1 0 0 13 6 13
## 2 17 4 2 13 8 7 7
## 3 21 0 0 0 0 0 11
## 4 43 2 7 5 1 6 349
## 5 44 0 0 0 0 0 39
## 6 45 2 0 1 2 0 283
## 7 47 0 0 0 0 0 92
## 8 48 0 0 0 0 0 9
## 9 51 0 0 0 0 7 5
## 10 52 0 0 0 0 17 8
## 11 53 0 0 0 0 7 4
## 12 54 0 0 0 0 232 220
## 13 55 0 0 0 0 6 8
## 14 57 7 5 16 9 630 771
## 15 60 0 4 10 8 4 94
## 16 71 0 0 0 0 0 4
## 17 81 0 0 0 0 0 7
## 18 83 0 0 0 0 0 17
## 19 101 0 0 0 0 0 19
## 20 103 4 4 4 1 5 2
## count_2011 count_2012 count_2013 count_2014 mean_2005 mean_2006 mean_2007
## 1 17 0 0 0 1.29 NA NA
## 2 9 0 0 0 4090.04 693.36 309.99
## 3 15 15 26 25 NA NA NA
## 4 365 400 431 339 59.67 832.17 62.04
## 5 70 58 43 43 NA NA NA
## 6 240 251 190 167 135.71 NA 9.93
## 7 97 104 112 98 NA NA NA
## 8 14 16 19 16 NA NA NA
## 9 13 14 12 0 NA NA NA
## 10 29 29 9 0 NA NA NA
## 11 14 25 21 0 NA NA NA
## 12 280 290 228 0 NA NA NA
## 13 16 37 38 0 NA NA NA
## 14 799 835 694 1 106.93 1480.77 625.59
## 15 93 107 92 84 NA 86.91 156.13
## 16 5 6 11 6 NA NA NA
## 17 6 9 5 5 NA NA NA
## 18 16 31 52 33 NA NA NA
## 19 13 18 6 23 NA NA NA
## 20 3 6 2 4 26.36 26.45 22.27
## mean_2008 mean_2009 mean_2010 mean_2011 mean_2012 mean_2013 mean_2014
## 1 166.12 460.99 392.59 582.09 NA NA NA
## 2 288.41 446.97 704.31 794.59 NA NA NA
## 3 NA NA 1829.60 2269.57 3541.83 2490.15 2782.23
## 4 52.71 188.12 191.84 206.33 243.01 246.57 272.25
## 5 NA NA 1025.84 795.37 1066.63 1032.39 2203.21
## 6 26.91 NA 336.09 335.38 308.56 360.81 245.82
## 7 NA NA 286.40 531.53 910.00 929.63 695.88
## 8 NA NA 1387.85 7317.37 33615.64 6184.73 10587.06
## 9 NA 126.96 473.96 105.13 101.83 121.06 NA
## 10 NA 193.06 86.76 181.67 115.75 127.20 NA
## 11 NA 5.79 104.05 1947.50 331.86 1481.62 NA
## 12 NA 759.60 436.09 2149.14 1039.38 800.94 NA
## 13 NA 467.97 654.90 300.41 796.18 1607.64 NA
## 14 581.74 1736.28 749.13 1072.40 1292.00 1203.78 104.37
## 15 328.26 108.70 109.24 100.50 142.38 117.61 170.38
## 16 NA NA 69.08 16.19 5.58 28.56 47.64
## 17 NA NA 26.85 5.72 12.34 11.68 53.37
## 18 NA NA 9.68 21.40 13.67 15.44 32.53
## 19 NA NA 49.24 199.91 162.59 404.21 321.66
## 20 3.60 1.45 36.27 11.85 11.23 24.99 62.75
## geom_2005 geom_2006 geom_2007 geom_2008 geom_2009 geom_2010 geom_2011
## 1 1.29 NA NA 36.15 154.18 309.78 315.11
## 2 1090.39 550.47 164.10 124.26 232.71 363.26 401.03
## 3 NA NA NA NA NA 1050.90 1501.18
## 4 59.14 270.00 47.90 52.71 98.63 113.23 121.02
## 5 NA NA NA NA NA 452.66 443.19
## 6 128.50 NA 9.93 24.76 NA 189.76 167.95
## 7 NA NA NA NA NA 87.93 106.47
## 8 NA NA NA NA NA 229.74 1209.34
## 9 NA NA NA NA 111.35 210.02 73.01
## 10 NA NA NA NA 49.10 40.81 114.44
## 11 NA NA NA NA 4.07 61.71 426.00
## 12 NA NA NA NA 60.99 67.48 97.17
## 13 NA NA NA NA 153.67 294.55 112.79
## 14 47.78 183.46 245.48 91.98 196.55 205.53 210.85
## 15 NA 67.63 55.61 21.12 46.89 26.74 43.32
## 16 NA NA NA NA NA 38.61 7.99
## 17 NA NA NA NA NA 6.16 2.51
## 18 NA NA NA NA NA 5.86 4.98
## 19 NA NA NA NA NA 4.47 50.35
## 20 5.07 2.79 10.48 3.60 0.92 24.72 2.96
## geom_2012 geom_2013 geom_2014 median_2005 median_2006 median_2007
## 1 NA NA NA 1.29 NA NA
## 2 NA NA NA 1141.32 693.36 183.38
## 3 1316.28 2264.85 2491.92 NA NA NA
## 4 122.26 138.11 162.10 59.67 1062.79 56.73
## 5 499.39 568.93 694.33 NA NA NA
## 6 169.68 209.86 137.38 135.71 NA 9.93
## 7 175.93 216.75 102.95 NA NA NA
## 8 1534.88 1796.71 1576.79 NA NA NA
## 9 71.03 84.34 NA NA NA NA
## 10 72.46 99.40 NA NA NA NA
## 11 30.84 190.77 NA NA NA NA
## 12 63.96 83.67 NA NA NA NA
## 13 485.62 598.73 NA NA NA NA
## 14 230.48 250.44 104.37 84.13 111.41 510.47
## 15 40.12 40.02 50.11 NA 54.63 83.01
## 16 2.09 4.61 10.17 NA NA NA
## 17 2.73 4.33 7.55 NA NA NA
## 18 4.33 8.18 13.30 NA NA NA
## 19 31.06 64.09 103.91 NA NA NA
## 20 2.02 24.96 35.16 6.13 0.85 7.50
## median_2008 median_2009 median_2010 median_2011 median_2012 median_2013
## 1 22.27 406.22 373.15 621.18 NA NA
## 2 152.06 136.77 393.93 367.45 NA NA
## 3 NA NA 1481.32 2627.73 2083.60 2445.23
## 4 52.71 84.06 116.53 115.86 122.21 173.30
## 5 NA NA 468.20 457.77 442.86 437.59
## 6 26.91 NA 222.04 181.76 176.55 234.23
## 7 NA NA 92.77 90.42 155.85 174.15
## 8 NA NA 99.74 2507.54 2679.60 4599.90
## 9 NA 118.80 118.41 100.19 88.71 120.15
## 10 NA 39.17 81.73 123.38 110.69 104.98
## 11 NA 6.52 108.03 1056.23 7.28 104.50
## 12 NA 58.09 62.58 72.23 61.22 76.84
## 13 NA 184.00 235.76 185.58 957.85 373.49
## 14 37.52 240.35 237.25 250.00 222.44 239.95
## 15 15.11 71.67 25.94 42.43 49.32 38.63
## 16 NA NA 41.37 9.14 1.91 7.29
## 17 NA NA 3.67 4.23 2.21 1.95
## 18 NA NA 5.07 3.08 4.17 10.06
## 19 NA NA 6.81 91.63 56.42 227.21
## 20 3.60 0.70 36.27 1.21 1.87 24.99
## median_2014 min_dist Cuenca_min_dist
## 1 NA 185318.80 17
## 2 NA 185318.80 13
## 3 2401.49 195221.08 17
## 4 185.36 90967.10 44
## 5 500.00 72528.65 45
## 6 170.56 72528.65 44
## 7 83.48 45876.23 48
## 8 3751.74 45876.23 47
## 9 NA 24977.81 52
## 10 NA 24977.81 51
## 11 NA 52369.51 52
## 12 NA 43308.28 52
## 13 NA 68545.60 53
## 14 104.37 87920.84 55
## 15 39.20 81283.60 71
## 16 16.61 81283.60 60
## 17 4.35 112686.95 83
## 18 9.69 112686.95 81
## 19 322.62 89357.86 103
## 20 23.67 89357.86 101
# Especificar el predictorMatrix
pred <- make.predictorMatrix(data_combined)
# Incluir la distancia mínima como predictor
pred[ , "min_dist"] <- 1
pred["Cuenca", ] <- 0
# Imputar los datos
imputed_data <- mice(data_combined, m=5, method='pmm', maxit=50, seed=500, predictorMatrix=pred, printFlag=FALSE)
# Obtener los datos imputados y separa en 4 matrices completas
complete_data <- complete(imputed_data)
data_count_imputed <- complete_data %>% select(Cuenca, starts_with("count"))
data_mean_imputed <- complete_data %>% select(Cuenca, starts_with("mean"))
data_geometric_mean_imputed <- complete_data %>% select(Cuenca, starts_with("geom"))
data_median_imputed <- complete_data %>% select(Cuenca, starts_with("median"))
| Cuenca | count_2005 | count_2006 | count_2007 | count_2008 | count_2009 | count_2010 | count_2011 | count_2012 | count_2013 | count_2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1 | 0 | 0 | 13 | 6 | 13 | 17 | 0 | 0 | 0 |
| 17 | 4 | 2 | 13 | 8 | 7 | 7 | 9 | 0 | 0 | 0 |
| 21 | 0 | 0 | 0 | 0 | 0 | 11 | 15 | 15 | 26 | 25 |
| 43 | 2 | 7 | 5 | 1 | 6 | 349 | 365 | 400 | 431 | 339 |
| 44 | 0 | 0 | 0 | 0 | 0 | 39 | 70 | 58 | 43 | 43 |
| 45 | 2 | 0 | 1 | 2 | 0 | 283 | 240 | 251 | 190 | 167 |
| 47 | 0 | 0 | 0 | 0 | 0 | 92 | 97 | 104 | 112 | 98 |
| 48 | 0 | 0 | 0 | 0 | 0 | 9 | 14 | 16 | 19 | 16 |
| 51 | 0 | 0 | 0 | 0 | 7 | 5 | 13 | 14 | 12 | 0 |
| 52 | 0 | 0 | 0 | 0 | 17 | 8 | 29 | 29 | 9 | 0 |
| 53 | 0 | 0 | 0 | 0 | 7 | 4 | 14 | 25 | 21 | 0 |
| 54 | 0 | 0 | 0 | 0 | 232 | 220 | 280 | 290 | 228 | 0 |
| 55 | 0 | 0 | 0 | 0 | 6 | 8 | 16 | 37 | 38 | 0 |
| 57 | 7 | 5 | 16 | 9 | 630 | 771 | 799 | 835 | 694 | 1 |
| 60 | 0 | 4 | 10 | 8 | 4 | 94 | 93 | 107 | 92 | 84 |
| 71 | 0 | 0 | 0 | 0 | 0 | 4 | 5 | 6 | 11 | 6 |
| 81 | 0 | 0 | 0 | 0 | 0 | 7 | 6 | 9 | 5 | 5 |
| 83 | 0 | 0 | 0 | 0 | 0 | 17 | 16 | 31 | 52 | 33 |
| 101 | 0 | 0 | 0 | 0 | 0 | 19 | 13 | 18 | 6 | 23 |
| 103 | 4 | 4 | 4 | 1 | 5 | 2 | 3 | 6 | 2 | 4 |
| Cuenca | mean_2005 | mean_2006 | mean_2007 | mean_2008 | mean_2009 | mean_2010 | mean_2011 | mean_2012 | mean_2013 | mean_2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1.29 | 26.45 | 22.27 | 166.12 | 460.99 | 392.59 | 582.09 | 13.67 | 15.44 | 53.37 |
| 17 | 4090.04 | 693.36 | 309.99 | 288.41 | 446.97 | 704.31 | 794.59 | 1039.38 | 1481.62 | 695.88 |
| 21 | 59.67 | 1480.77 | 156.13 | 288.41 | 759.60 | 1829.60 | 2269.57 | 3541.83 | 2490.15 | 2782.23 |
| 43 | 59.67 | 832.17 | 62.04 | 52.71 | 188.12 | 191.84 | 206.33 | 243.01 | 246.57 | 272.25 |
| 44 | 135.71 | 832.17 | 22.27 | 166.12 | 460.99 | 1025.84 | 795.37 | 1066.63 | 1032.39 | 2203.21 |
| 45 | 135.71 | 86.91 | 9.93 | 26.91 | 1.45 | 336.09 | 335.38 | 308.56 | 360.81 | 245.82 |
| 47 | 135.71 | 693.36 | 9.93 | 288.41 | 1.45 | 286.40 | 531.53 | 910.00 | 929.63 | 695.88 |
| 48 | 4090.04 | 86.91 | 62.04 | 581.74 | 108.70 | 1387.85 | 7317.37 | 33615.64 | 6184.73 | 10587.06 |
| 51 | 135.71 | 832.17 | 156.13 | 26.91 | 126.96 | 473.96 | 105.13 | 101.83 | 121.06 | 321.66 |
| 52 | 1.29 | 26.45 | 62.04 | 26.91 | 193.06 | 86.76 | 181.67 | 115.75 | 127.20 | 47.64 |
| 53 | 4090.04 | 26.45 | 309.99 | 52.71 | 5.79 | 104.05 | 1947.50 | 331.86 | 1481.62 | 2203.21 |
| 54 | 59.67 | 832.17 | 156.13 | 288.41 | 759.60 | 436.09 | 2149.14 | 1039.38 | 800.94 | 47.64 |
| 55 | 106.93 | 26.45 | 309.99 | 26.91 | 467.97 | 654.90 | 300.41 | 796.18 | 1607.64 | 2782.23 |
| 57 | 106.93 | 1480.77 | 625.59 | 581.74 | 1736.28 | 749.13 | 1072.40 | 1292.00 | 1203.78 | 104.37 |
| 60 | 26.36 | 86.91 | 156.13 | 328.26 | 108.70 | 109.24 | 100.50 | 142.38 | 117.61 | 170.38 |
| 71 | 59.67 | 86.91 | 62.04 | 288.41 | 446.97 | 69.08 | 16.19 | 5.58 | 28.56 | 47.64 |
| 81 | 26.36 | 693.36 | 62.04 | 3.60 | 446.97 | 26.85 | 5.72 | 12.34 | 11.68 | 53.37 |
| 83 | 59.67 | 1480.77 | 22.27 | 166.12 | 467.97 | 9.68 | 21.40 | 13.67 | 15.44 | 32.53 |
| 101 | 26.36 | 1480.77 | 156.13 | 52.71 | 460.99 | 49.24 | 199.91 | 162.59 | 404.21 | 321.66 |
| 103 | 26.36 | 26.45 | 22.27 | 3.60 | 1.45 | 36.27 | 11.85 | 11.23 | 24.99 | 62.75 |
| Cuenca | geom_2005 | geom_2006 | geom_2007 | geom_2008 | geom_2009 | geom_2010 | geom_2011 | geom_2012 | geom_2013 | geom_2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1.29 | 67.63 | 47.90 | 36.15 | 154.18 | 309.78 | 315.11 | 175.93 | 598.73 | 13.30 |
| 17 | 1090.39 | 550.47 | 164.10 | 124.26 | 232.71 | 363.26 | 401.03 | 2.73 | 190.77 | 10.17 |
| 21 | 59.14 | 183.46 | 55.61 | 24.76 | 196.55 | 1050.90 | 1501.18 | 1316.28 | 2264.85 | 2491.92 |
| 43 | 59.14 | 270.00 | 47.90 | 52.71 | 98.63 | 113.23 | 121.02 | 122.26 | 138.11 | 162.10 |
| 44 | 1090.39 | 67.63 | 55.61 | 3.60 | 232.71 | 452.66 | 443.19 | 499.39 | 568.93 | 694.33 |
| 45 | 128.50 | 550.47 | 9.93 | 24.76 | 153.67 | 189.76 | 167.95 | 169.68 | 209.86 | 137.38 |
| 47 | 47.78 | 270.00 | 9.93 | 21.12 | 60.99 | 87.93 | 106.47 | 175.93 | 216.75 | 102.95 |
| 48 | 128.50 | 183.46 | 47.90 | 124.26 | 232.71 | 229.74 | 1209.34 | 1534.88 | 1796.71 | 1576.79 |
| 51 | 128.50 | 2.79 | 47.90 | 36.15 | 111.35 | 210.02 | 73.01 | 71.03 | 84.34 | 104.37 |
| 52 | 1.29 | 2.79 | 9.93 | 21.12 | 49.10 | 40.81 | 114.44 | 72.46 | 99.40 | 10.17 |
| 53 | 1.29 | 67.63 | 55.61 | 91.98 | 4.07 | 61.71 | 426.00 | 30.84 | 190.77 | 10.17 |
| 54 | 1.29 | 270.00 | 9.93 | 36.15 | 60.99 | 67.48 | 97.17 | 63.96 | 83.67 | 10.17 |
| 55 | 5.07 | 550.47 | 10.48 | 21.12 | 153.67 | 294.55 | 112.79 | 485.62 | 598.73 | 137.38 |
| 57 | 47.78 | 183.46 | 245.48 | 91.98 | 196.55 | 205.53 | 210.85 | 230.48 | 250.44 | 104.37 |
| 60 | 59.14 | 67.63 | 55.61 | 21.12 | 46.89 | 26.74 | 43.32 | 40.12 | 40.02 | 50.11 |
| 71 | 128.50 | 270.00 | 55.61 | 52.71 | 0.92 | 38.61 | 7.99 | 2.09 | 4.61 | 10.17 |
| 81 | 1.29 | 2.79 | 164.10 | 52.71 | 60.99 | 6.16 | 2.51 | 2.73 | 4.33 | 7.55 |
| 83 | 59.14 | 550.47 | 10.48 | 36.15 | 60.99 | 5.86 | 4.98 | 4.33 | 8.18 | 13.30 |
| 101 | 1.29 | 183.46 | 47.90 | 24.76 | 46.89 | 4.47 | 50.35 | 31.06 | 64.09 | 103.91 |
| 103 | 5.07 | 2.79 | 10.48 | 3.60 | 0.92 | 24.72 | 2.96 | 2.02 | 24.96 | 35.16 |
| Cuenca | median_2005 | median_2006 | median_2007 | median_2008 | median_2009 | median_2010 | median_2011 | median_2012 | median_2013 | median_2014 |
|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 1.29 | 693.36 | 9.93 | 22.27 | 406.22 | 373.15 | 621.18 | 1.91 | 437.59 | 4.35 |
| 17 | 1141.32 | 693.36 | 183.38 | 152.06 | 136.77 | 393.93 | 367.45 | 7.28 | 7.29 | 322.62 |
| 21 | NA | 693.36 | 7.50 | 152.06 | 240.35 | 1481.32 | 2627.73 | 2083.60 | 2445.23 | 2401.49 |
| 43 | 59.67 | 1062.79 | 56.73 | 52.71 | 84.06 | 116.53 | 115.86 | 122.21 | 173.30 | 185.36 |
| 44 | NA | 1062.79 | 83.01 | 37.52 | 136.77 | 468.20 | 457.77 | 442.86 | 437.59 | 500.00 |
| 45 | 135.71 | 54.63 | 9.93 | 26.91 | 84.06 | 222.04 | 181.76 | 176.55 | 234.23 | 170.56 |
| 47 | NA | 1062.79 | 183.38 | 52.71 | 39.17 | 92.77 | 90.42 | 155.85 | 174.15 | 83.48 |
| 48 | NA | 1062.79 | 9.93 | 37.52 | 58.09 | 99.74 | 2507.54 | 2679.60 | 4599.90 | 3751.74 |
| 51 | NA | 1062.79 | 83.01 | 22.27 | 118.80 | 118.41 | 100.19 | 88.71 | 120.15 | 39.20 |
| 52 | NA | 111.41 | 83.01 | 26.91 | 39.17 | 81.73 | 123.38 | 110.69 | 104.98 | 23.67 |
| 53 | NA | 0.85 | 9.93 | 37.52 | 6.52 | 108.03 | 1056.23 | 7.28 | 104.50 | 4.35 |
| 54 | NA | 1062.79 | 9.93 | 26.91 | 58.09 | 62.58 | 72.23 | 61.22 | 76.84 | 23.67 |
| 55 | NA | 54.63 | 9.93 | 52.71 | 184.00 | 235.76 | 185.58 | 957.85 | 373.49 | 23.67 |
| 57 | 84.13 | 111.41 | 510.47 | 37.52 | 240.35 | 237.25 | 250.00 | 222.44 | 239.95 | 104.37 |
| 60 | NA | 54.63 | 83.01 | 15.11 | 71.67 | 25.94 | 42.43 | 49.32 | 38.63 | 39.20 |
| 71 | NA | 0.85 | 9.93 | 37.52 | 0.70 | 41.37 | 9.14 | 1.91 | 7.29 | 16.61 |
| 81 | NA | 0.85 | 7.50 | 37.52 | 39.17 | 3.67 | 4.23 | 2.21 | 1.95 | 4.35 |
| 83 | NA | 111.41 | 83.01 | 22.27 | 0.70 | 5.07 | 3.08 | 4.17 | 10.06 | 9.69 |
| 101 | NA | 693.36 | 9.93 | 37.52 | 39.17 | 6.81 | 91.63 | 56.42 | 227.21 | 322.62 |
| 103 | 6.13 | 0.85 | 7.50 | 3.60 | 0.70 | 36.27 | 1.21 | 1.87 | 24.99 | 23.67 |