Este documento hace parte del trabajo del curso de Métodos multivariados aplicados de la Universidad Nacional de Colombia para la Maestría y Especialización en Estadística. El alcance de este documento reporta de manera técnica el paso a paso que se llevó a cabo para la consolidación del trabajo.
data %>% select(VENDEDOR, NUM_DOC, FECHA, ITEM, CLIENTE, COD_SUBGRUPO, COD_DEP, DEP, COD_CIU, COD_CCO)
10 Variables 9898 Observations
--------------------------------------------------------------------------------
VENDEDOR
n missing distinct
9898 0 17
lowest : 001 002 003 005 007, highest: 030 031 032 035 036
Value 1 2 3 5 7 9 11 14 17 18 22
Frequency 201 290 357 1209 687 28 1 1344 2302 164 2379
Proportion 0.020 0.029 0.036 0.122 0.069 0.003 0.000 0.136 0.233 0.017 0.240
Value 27 30 31 32 35 36
Frequency 122 350 4 375 24 61
Proportion 0.012 0.035 0.000 0.038 0.002 0.006
--------------------------------------------------------------------------------
NUM_DOC
n missing distinct
9898 0 7788
lowest : DVV-0000260 DVV-0000261 DVV-0000262 DVV-0000263 DVV-0000318
highest: MED-37335 MED-37337 MED-37338 MED-37339 MED-37340
--------------------------------------------------------------------------------
FECHA
n missing distinct Info Mean Gmd .05
9898 0 736 1 2015-07-27 364 2014-03-21
.10 .25 .50 .75 .90 .95
2014-05-22 2014-10-22 2015-08-01 2016-04-25 2016-10-08 2016-11-21
lowest : 2014-01-10 2014-01-11 2014-01-12 2014-01-13 2014-01-16
highest: 2017-01-26 2017-01-27 2017-01-28 2017-01-29 2017-01-30
--------------------------------------------------------------------------------
ITEM
n missing distinct
9898 0 248
lowest : 1000-0001 1000-0004 1000-0009 1100-0002 1100-0003
highest: 900-0020 900-0022 900-0024 900-0025 900-0027
--------------------------------------------------------------------------------
CLIENTE
n missing distinct
9898 0 371
lowest : 10109050 102197600 1022348758 1024501802 1026553774
highest: 9815991 98535261 98584936 985849362 98665688
--------------------------------------------------------------------------------
COD_SUBGRUPO
n missing distinct
9898 0 24
lowest : 001 002 003 004 005, highest: 028 040 100 101 102
--------------------------------------------------------------------------------
COD_DEP
n missing distinct
9305 593 17
lowest : 05 08 11 13 15, highest: 63 66 68 73 76
Value 5 8 11 13 15 17 19 25 41 50 52
Frequency 3421 143 2790 1 11 41 8 1066 6 1 4
Proportion 0.368 0.015 0.300 0.000 0.001 0.004 0.001 0.115 0.001 0.000 0.000
Value 54 63 66 68 73 76
Frequency 64 32 444 129 61 1083
Proportion 0.007 0.003 0.048 0.014 0.007 0.116
--------------------------------------------------------------------------------
DEP
n missing distinct
9305 593 16
lowest : ANTIOQUIA ATLÁNTICO BOGOTÁ, D.C. BOLÍVAR BOYACÁ
highest: QUINDIO RISARALDA SANTANDER TOLIMA VALLE DEL CAUCA
ANTIOQUIA (3421, 0.368), ATLÁNTICO (143, 0.015), BOGOTÁ, D.C. (3856, 0.414),
BOLÍVAR (1, 0.000), BOYACÁ (11, 0.001), CALDAS (41, 0.004), CAUCA (8, 0.001),
HUILA (6, 0.001), META (1, 0.000), NARIÑO (4, 0.000), NORTE DE SANTANDER (64,
0.007), QUINDIO (32, 0.003), RISARALDA (444, 0.048), SANTANDER (129, 0.014),
TOLIMA (61, 0.007), VALLE DEL CAUCA (1083, 0.116)
--------------------------------------------------------------------------------
COD_CIU
n missing distinct
9898 0 51
lowest : 05000 05001 05088 05129 05237, highest: 76147 76364 76520 76834 76892
--------------------------------------------------------------------------------
COD_CCO
n missing distinct
9898 0 12
lowest : 001 002 003 005 006, highest: 009 010 012 014 015
Value 1 2 3 5 6 7 8 9 10 12 14
Frequency 7080 2802 7 1 1 1 1 1 1 1 1
Proportion 0.715 0.283 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Value 15
Frequency 1
Proportion 0.000
--------------------------------------------------------------------------------
data %>% select(-c(VENDEDOR, NUM_DOC, FECHA, ITEM, CLIENTE, COD_SUBGRUPO, COD_DEP, DEP, COD_CIU, COD_CCO))
7 Variables 9898 Observations
--------------------------------------------------------------------------------
AÑO
n missing distinct Info Mean Gmd
9898 0 4 0.893 2015 0.917
Value 2014 2015 2016 2017
Frequency 3160 3254 3354 130
Proportion 0.319 0.329 0.339 0.013
--------------------------------------------------------------------------------
MES
n missing distinct Info Mean Gmd .05 .10
9898 0 12 0.993 6.85 3.83 1 2
.25 .50 .75 .90 .95
4 7 10 11 12
lowest : 1 2 3 4 5, highest: 8 9 10 11 12
Value 1 2 3 4 5 6 7 8 9 10 11
Frequency 552 712 762 882 791 765 922 926 853 983 949
Proportion 0.056 0.072 0.077 0.089 0.080 0.077 0.093 0.094 0.086 0.099 0.096
Value 12
Frequency 801
Proportion 0.081
--------------------------------------------------------------------------------
DIA_PLA
n missing distinct Info Mean Gmd
9898 0 9 0.892 41 27
lowest : 0 1 8 15 30, highest: 30 45 60 75 90
Value 0 1 8 15 30 45 60 75 90
Frequency 1793 6 6 52 2915 385 4204 33 504
Proportion 0.181 0.001 0.001 0.005 0.295 0.039 0.425 0.003 0.051
--------------------------------------------------------------------------------
CANTIDAD
n missing distinct Info Mean Gmd .05 .10
9898 0 272 0.995 105 142 3.0 9.7
.25 .50 .75 .90 .95
25.0 42.0 100.0 200.0 400.0
lowest : -2000 -1000 -760 -600 -500, highest: 4000 4100 4500 4994 7500
--------------------------------------------------------------------------------
PRE_TOT
n missing distinct Info Mean Gmd .05 .10
9898 0 4071 1 1798791 2777935 76948 132500
.25 .50 .75 .90 .95
212000 450000 1166782 3547008 7072510
lowest : -20320816 -16082100 -11235194 -10494000 -7009593
highest: 68350560 71879720 124830404 158253480 160170648
--------------------------------------------------------------------------------
TRM
n missing distinct Info Mean Gmd .05 .10
9898 0 693 1 1900 131 1772 1780
.25 .50 .75 .90 .95
1802 1882 1934 2049 2158
lowest : 1754.89 1757.24 1758.03 1758.38 1758.45
highest: 2414.39 2423.56 2438.79 2442.03 2446.35
--------------------------------------------------------------------------------
PRE_TOT_US
n missing distinct Info Mean Gmd .05 .10
9898 0 9180 1 941 1450 40.0 67.9
.25 .50 .75 .90 .95
111.5 232.8 606.0 1831.2 3648.3
lowest : -10569.665 -8957.391 -5841.957 -5122.148 -3883.130
highest: 35498.808 38688.067 64779.996 66383.999 66383.999
--------------------------------------------------------------------------------
En el siguiente apartado se realiza el analisis de cluster usando diferentes tecnicas (tanto jerarquico como no jerarquico)…..