1. Perfilamiento de datos

Al final de este tema, tendremos ideas claras acerca de los siguientes temas:

Los conceptos principales del perfilamiento de datos.
Por qué y cómo se debe realizar un análisis exploratorio.
Cómo identificar los principales problemas existentes en una base de datos.
Cuáles son las principales librerías existentes para la limpieza de datos, compatibles con el tidyverse.

1.1 Introducción

El perfilamiento de datos es el proceso de revisión exhaustiva de las fuentes de información que utilizamos en un proyecto de análisis de datos, al comprender su estructura, contenido e interrelaciones, identificando su potencial o problemáticas.
R como herramienta para limpiar los datos, reduce (o evita) costos de licenciamiento y es un potente lenguaje para realizar análisis estadístico debido a su gran ecosistema de paquetes especializados; además, es un lenguaje rápido de aprender y bien documentado.
- Se puede automatizar fácilmente, gracias a la creación de scripts que automatizan procesos; por ejemplo, leer datos o hacer operaciones con los datos y hacerlo siempre de forma automática.
- Puede leer prácticamente cualquier tipo de datos estructurados (filas y columnas): Excel (csv, xlsx, txt), SAS, Stata, SPSS, SQL, etc; y datos no estructurados también: texto, audios, imágenes y videos. Para más información puede consultar aquí.
- Hasta cierto punto, es compatible con grandes conjuntos de datos. Para más información puede consultar aquí
- Tiene capacidades avanzadas de gráficos, por lo que nos permite realizar gráficos (Top 50 ggplot2 Visualizations ) y dashboards (shiny ) de forma que podamos presentar los resultados de forma vistosa.
- Mejora su funcionalidad constantemente, ya que tiene detrás una comunidad bastante grande que crea nuevas funciones, corrige bugs y, sobre todo, documenta muy bien todo lo que va haciendo, de forma que la utilización de todas las funciones y métodos sea fácil a nivel de usuario. Para más información puede consultar aquí.

1- En la primera ventana tenemos un script (.R), dónde podemos escribir nuestro código, ejecutarlo y guardarlo.
2- Debajo tenemos la consola, la cual muestra los resultados del código que vayamos ejecutando, y dónde podemos hacer pruebas.
3- En la tercera pantalla, tenemos nuestro directorio de trabajo con las tablas, funciones y las variables que vayamos creando.
4- Y en la esquina inferior derecha encontramos los gráficos que creamos, los paquetes que tengamos instalados, los ficheros y una ventana de ayuda.

1.1.1 Qué es el perfilamiento de datos?

Hoy por hoy existen varias técnicas estadísticas que proporcionan información sobre las características cualitativas de los datos, siendo unos de sus objetivos principales el descubrir y validar los metadatos del conjunto entre manos.
Algunos usos comunes del perfilamiento de datos incluyen:
- Almacenamiento de datos e inteligencia de negocios.
- La estructuración y normalización de bases de datos.
- La calidad de datos.
- Los modelos de ciencia de datos.
En cualquiera de ellos, el perfilamiento permite descubrir problemas en los orígenes de los datos y sus posibles causas (por ejemplo, error humano, corrupción de datos, etc.) y qué se necesita hacer para corregirlo al integrarlos.

1.1.2 Consideraciones y herramientas

En un proceso de perfilamiento de datos se toman las siguientes consideraciones:
- Recopilar estadísticas descriptivas como mínimos, máximos, conteos y totales.
- Recopilar los tipos de datos, su longitud y patrones recurrentes.
- Etiquetar datos con palabras clave, descripciones o categorías.
- Evaluar el riesgo de dependencias y uniones en los datos.
- Identificar distribuciones y realizar análisis entre tablas.
Situaciones comunes que vamos a encontrar (hay muchas más):
- Inconsistencias de datos en los orígenes, como registros perdidos y valores nulos.
- Columna que representa un “Primary Key”, sin embargo no tiene valores únicos.
- El esquema entidad-relación de la base de datos no es coherente con los requerimientos de negocios.

1.1.3 Data Profiling vs. Data Quality

Confundir Data Profiling (perfilamiento de los datos) con Data Quality (calidad de los datos) es un error muy frecuente, cada uno se ocupa de etapas diferentes cuando hablamos de gestión de datos. Más adelante profundizaremos en el concepto de calidad de datos para entenderlo mejor.
El perfilamiento es una etapa de diagnóstico y análisis dónde se determinan los requerimientos de calidad, para hablar de calidad de datos debemos evaluar el estado de nuestros datos con respecto a un parámetro deseado y cuantificarlo.
En general la etapa de Data Quality es posterior a la de Data Profiling.

A continuación se adjunta un diagrama del ciclo de vida de la Calidad de Datos:

1.1.4 Paquete tidyverse

Tidyverse es una librería que resume la mayor parte de las tareas que tiene que realizar un data-scientist. Se trata de una aportación de uno de los mayores gurúes de R: Hadley Wicham.
Consta de librerías para la minería de datos como podemos ver a continuación:

ggplot2.– Es la librería más famosa. Se trata de una gramática de gráficos para explorar datos y comunicar las conclusiones. Permite seleccionar y filtrar los datos, las diferentes geometrías, escalas, coordenadas, divisiones, zooms, etc.
dplyr.– Es la segunda librería más famosa. Creada para transformar los datos, vendría a ser el equivalente a un lenguaje SQL, e incluye sus mismas funcionalidades.
readr.– Es una librería de lectura de diferentes fuentes de datos. Su ventaja sobre las otras librerías de lectura de R es que permite integrarse perfectamente con las otras dos librerías anteriores, mediante la concatenación de órdenes: %>%. (pipes)
purrr.– Es una librería que permite explotar una de las grandes funcionalidades de R : la vectorización. Para explicarlo, a continuación un ejemplo:

Instalamos únicamente tidyverse!

library(tidyverse)

library(purrr) # cargamos la librería

mtcars %>% str()

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

mtcars %>% #elegimos el conjunto de datos a trabajar

  split(.$cyl) %>%  # dividimos los datos según los distintos valores del campo cyl

  map(~ lm(mpg ~ wt, data = .)) %>% # realizamos una regresión lineal para cada subconjunto

  map(summary) %>% # sacamos el coeficiente de determinación para cada subconjunto, o sea, cuando cyl vale 4, 6, 8
map_dbl("r.squared")

##         4         6         8 
## 0.5086326 0.4645102 0.4229655

1.2 Análisis exploratorio

Existen tres tipos principales de análisis:

Además, es clave el conocimiento de nuestro negocio, como se comportan los datos según la industria, empresa y sistemas en los que nos movemos, pues hay un análisis adicional (en muchos casos no formal) relacionado con la validación de reglas de datos, en donde se verifica la conformidad de las instancias y los conjuntos de datos con reglas predefinidas. Por ejemplo:

La edad de los clientes debe ser mayor igual a 18 y no superior a cierta edad.
La tendencia de las ventas.
Los segmentos socioeconómicos y demográficos en general.
La distribución de género, ciudades, etc.

1.2.1 Estructura

Consiste en validar que los datos son consistentes y están formateados correctamente, se debe realizar verificaciones matemáticas en los datos (suma, mínimo o máximo, etc.).

Ejemplos:

Cantidad de dígios de los números telefónicos.
Cantidad de dígitos de números de cédula y/o pasaportes según el país.
Cantidad de dígitos del código postal.
Cantidad de veces que un valor aparece dentro de una columna.
Se concentra en realizar validaciones a nivel de columnas, también conocido como Column profile.

# con este símbolo podemos incorporar comentarios en nuestro código

comunidades <-  read.csv('https://storage.googleapis.com/datasets-academy/Profiling/Data/communities.csv' , sep=',', na.strings ="?") # lectura de un archivo csv

str(comunidades) # Muestra de forma compacta la estructura de un objeto R

## 'data.frame':    1994 obs. of  128 variables:
##  $ state                : int  8 53 24 34 42 6 44 6 21 29 ...
##  $ county               : int  NA NA NA 5 95 NA 7 NA NA NA ...
##  $ community            : int  NA NA NA 81440 6096 NA 41500 NA NA NA ...
##  $ communityname        : Factor w/ 1828 levels "Aberdeencity",..: 796 1626 2 1788 142 1520 840 1462 669 288 ...
##  $ fold                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ population           : num  0.19 0 0 0.04 0.01 0.02 0.01 0.01 0.03 0.01 ...
##  $ householdsize        : num  0.33 0.16 0.42 0.77 0.55 0.28 0.39 0.74 0.34 0.4 ...
##  $ racepctblack         : num  0.02 0.12 0.49 1 0.02 0.06 0 0.03 0.2 0.06 ...
##  $ racePctWhite         : num  0.9 0.74 0.56 0.08 0.95 0.54 0.98 0.46 0.84 0.87 ...
##  $ racePctAsian         : num  0.12 0.45 0.17 0.12 0.09 1 0.06 0.2 0.02 0.3 ...
##  $ racePctHisp          : num  0.17 0.07 0.04 0.1 0.05 0.25 0.02 1 0 0.03 ...
##  $ agePct12t21          : num  0.34 0.26 0.39 0.51 0.38 0.31 0.3 0.52 0.38 0.9 ...
##  $ agePct12t29          : num  0.47 0.59 0.47 0.5 0.38 0.48 0.37 0.55 0.45 0.82 ...
##  $ agePct16t24          : num  0.29 0.35 0.28 0.34 0.23 0.27 0.23 0.36 0.28 0.8 ...
##  $ agePct65up           : num  0.32 0.27 0.32 0.21 0.36 0.37 0.6 0.35 0.48 0.39 ...
##  $ numbUrban            : num  0.2 0.02 0 0.06 0.02 0.04 0.02 0 0.04 0.02 ...
##  $ pctUrban             : num  1 1 0 1 0.9 1 0.81 0 1 1 ...
##  $ medIncome            : num  0.37 0.31 0.3 0.58 0.5 0.52 0.42 0.16 0.17 0.54 ...
##  $ pctWWage             : num  0.72 0.72 0.58 0.89 0.72 0.68 0.5 0.44 0.47 0.59 ...
##  $ pctWFarmSelf         : num  0.34 0.11 0.19 0.21 0.16 0.2 0.23 1 0.36 0.22 ...
##  $ pctWInvInc           : num  0.6 0.45 0.39 0.43 0.68 0.61 0.68 0.23 0.34 0.86 ...
##  $ pctWSocSec           : num  0.29 0.25 0.38 0.36 0.44 0.28 0.61 0.53 0.55 0.42 ...
##  $ pctWPubAsst          : num  0.15 0.29 0.4 0.2 0.11 0.15 0.21 0.97 0.48 0.02 ...
##  $ pctWRetire           : num  0.43 0.39 0.84 0.82 0.71 0.25 0.54 0.41 0.43 0.31 ...
##  $ medFamInc            : num  0.39 0.29 0.28 0.51 0.46 0.62 0.43 0.15 0.21 0.85 ...
##  $ perCapInc            : num  0.4 0.37 0.27 0.36 0.43 0.72 0.47 0.1 0.23 0.89 ...
##  $ whitePerCap          : num  0.39 0.38 0.29 0.4 0.41 0.76 0.44 0.12 0.23 0.94 ...
##  $ blackPerCap          : num  0.32 0.33 0.27 0.39 0.28 0.77 0.4 0.08 0.19 0.11 ...
##  $ indianPerCap         : num  0.27 0.16 0.07 0.16 0 0.28 0.24 0.17 0.1 0.09 ...
##  $ AsianPerCap          : num  0.27 0.3 0.29 0.25 0.74 0.52 0.86 0.27 0.26 0.33 ...
##  $ OtherPerCap          : num  0.36 0.22 0.28 0.36 0.51 0.48 0.24 0.18 0.29 0.17 ...
##  $ HispPerCap           : num  0.41 0.35 0.39 0.44 0.48 0.6 0.36 0.21 0.22 0.8 ...
##  $ NumUnderPov          : num  0.08 0.01 0.01 0.01 0 0.01 0.01 0.03 0.04 0 ...
##  $ PctPopUnderPov       : num  0.19 0.24 0.27 0.1 0.06 0.12 0.11 0.64 0.45 0.11 ...
##  $ PctLess9thGrade      : num  0.1 0.14 0.27 0.09 0.25 0.13 0.29 0.96 0.52 0.04 ...
##  $ PctNotHSGrad         : num  0.18 0.24 0.43 0.25 0.3 0.12 0.41 0.82 0.59 0.03 ...
##  $ PctBSorMore          : num  0.48 0.3 0.19 0.31 0.33 0.8 0.36 0.12 0.17 1 ...
##  $ PctUnemployed        : num  0.27 0.27 0.36 0.33 0.12 0.1 0.28 1 0.55 0.11 ...
##  $ PctEmploy            : num  0.68 0.73 0.58 0.71 0.65 0.65 0.54 0.26 0.43 0.44 ...
##  $ PctEmplManu          : num  0.23 0.57 0.32 0.36 0.67 0.19 0.44 0.43 0.59 0.2 ...
##  $ PctEmplProfServ      : num  0.41 0.15 0.29 0.45 0.38 0.77 0.53 0.34 0.36 1 ...
##  $ PctOccupManu         : num  0.25 0.42 0.49 0.37 0.42 0.06 0.33 0.71 0.64 0.02 ...
##  $ PctOccupMgmtProf     : num  0.52 0.36 0.32 0.39 0.46 0.91 0.49 0.18 0.29 0.96 ...
##  $ MalePctDivorce       : num  0.68 1 0.63 0.34 0.22 0.49 0.25 0.38 0.62 0.3 ...
##  $ MalePctNevMarr       : num  0.4 0.63 0.41 0.45 0.27 0.57 0.34 0.47 0.26 0.85 ...
##  $ FemalePctDiv         : num  0.75 0.91 0.71 0.49 0.2 0.61 0.28 0.59 0.66 0.39 ...
##  $ TotalPctDiv          : num  0.75 1 0.7 0.44 0.21 0.58 0.28 0.52 0.67 0.36 ...
##  $ PersPerFam           : num  0.35 0.29 0.45 0.75 0.51 0.44 0.42 0.78 0.37 0.31 ...
##  $ PctFam2Par           : num  0.55 0.43 0.42 0.65 0.91 0.62 0.77 0.45 0.51 0.65 ...
##  $ PctKids2Par          : num  0.59 0.47 0.44 0.54 0.91 0.69 0.81 0.43 0.55 0.73 ...
##  $ PctYoungKids2Par     : num  0.61 0.6 0.43 0.83 0.89 0.87 0.79 0.34 0.58 0.78 ...
##  $ PctTeen2Par          : num  0.56 0.39 0.43 0.65 0.85 0.53 0.74 0.34 0.47 0.67 ...
##  $ PctWorkMomYoungKids  : num  0.74 0.46 0.71 0.85 0.4 0.3 0.57 0.29 0.65 0.72 ...
##  $ PctWorkMom           : num  0.76 0.53 0.67 0.86 0.6 0.43 0.62 0.27 0.64 0.71 ...
##  $ NumIlleg             : num  0.04 0 0.01 0.03 0 0 0 0.02 0.02 0 ...
##  $ PctIlleg             : num  0.14 0.24 0.46 0.33 0.06 0.11 0.13 0.5 0.29 0.07 ...
##  $ NumImmig             : num  0.03 0.01 0 0.02 0 0.04 0.01 0.02 0 0.01 ...
##  $ PctImmigRecent       : num  0.24 0.52 0.07 0.11 0.03 0.3 0 0.5 0.12 0.41 ...
##  $ PctImmigRec5         : num  0.27 0.62 0.06 0.2 0.07 0.35 0.02 0.59 0.09 0.44 ...
##  $ PctImmigRec8         : num  0.37 0.64 0.15 0.3 0.2 0.43 0.02 0.65 0.07 0.52 ...
##  $ PctImmigRec10        : num  0.39 0.63 0.19 0.31 0.27 0.47 0.1 0.59 0.13 0.48 ...
##  $ PctRecentImmig       : num  0.07 0.25 0.02 0.05 0.01 0.5 0 0.69 0 0.22 ...
##  $ PctRecImmig5         : num  0.07 0.27 0.02 0.08 0.02 0.5 0.01 0.72 0 0.21 ...
##  $ PctRecImmig8         : num  0.08 0.25 0.04 0.11 0.04 0.56 0.01 0.71 0 0.22 ...
##  $ PctRecImmig10        : num  0.08 0.23 0.05 0.11 0.05 0.57 0.03 0.6 0 0.19 ...
##  $ PctSpeakEnglOnly     : num  0.89 0.84 0.88 0.81 0.88 0.45 0.73 0.12 0.99 0.85 ...
##  $ PctNotSpeakEnglWell  : num  0.06 0.1 0.04 0.08 0.05 0.28 0.05 0.93 0.01 0.03 ...
##  $ PctLargHouseFam      : num  0.14 0.16 0.2 0.56 0.16 0.25 0.12 0.74 0.12 0.09 ...
##  $ PctLargHouseOccup    : num  0.13 0.1 0.2 0.62 0.19 0.19 0.13 0.75 0.12 0.06 ...
##  $ PersPerOccupHous     : num  0.33 0.17 0.46 0.85 0.59 0.29 0.42 0.8 0.35 0.15 ...
##  $ PersPerOwnOccHous    : num  0.39 0.29 0.52 0.77 0.6 0.53 0.54 0.68 0.38 0.34 ...
##  $ PersPerRentOccHous   : num  0.28 0.17 0.43 1 0.37 0.18 0.24 0.92 0.33 0.05 ...
##  $ PctPersOwnOccup      : num  0.55 0.26 0.42 0.94 0.89 0.39 0.65 0.39 0.5 0.48 ...
##  $ PctPersDenseHous     : num  0.09 0.2 0.15 0.12 0.02 0.26 0.03 0.89 0.1 0.03 ...
##  $ PctHousLess3BR       : num  0.51 0.82 0.51 0.01 0.19 0.73 0.46 0.66 0.64 0.58 ...
##  $ MedNumBR             : num  0.5 0 0.5 0.5 0.5 0 0.5 0 0 0 ...
##  $ HousVacant           : num  0.21 0.02 0.01 0.01 0.01 0.02 0.01 0.01 0.04 0.02 ...
##  $ PctHousOccup         : num  0.71 0.79 0.86 0.97 0.89 0.84 0.89 0.91 0.72 0.72 ...
##  $ PctHousOwnOcc        : num  0.52 0.24 0.41 0.96 0.87 0.3 0.57 0.46 0.49 0.38 ...
##  $ PctVacantBoarded     : num  0.05 0.02 0.29 0.6 0.04 0.16 0.09 0.22 0.05 0.07 ...
##  $ PctVacMore6Mos       : num  0.26 0.25 0.3 0.47 0.55 0.28 0.49 0.37 0.49 0.47 ...
##  $ MedYrHousBuilt       : num  0.65 0.65 0.52 0.52 0.73 0.25 0.38 0.6 0.5 0.04 ...
##  $ PctHousNoPhone       : num  0.14 0.16 0.47 0.11 0.05 0.02 0.05 0.28 0.57 0.01 ...
##  $ PctWOFullPlumb       : num  0.06 0 0.45 0.11 0.14 0.05 0.05 0.23 0.22 0 ...
##  $ OwnOccLowQuart       : num  0.22 0.21 0.18 0.24 0.31 0.94 0.37 0.15 0.07 0.63 ...
##  $ OwnOccMedVal         : num  0.19 0.2 0.17 0.21 0.31 1 0.38 0.13 0.07 0.71 ...
##  $ OwnOccHiQuart        : num  0.18 0.21 0.16 0.19 0.3 1 0.39 0.13 0.08 0.79 ...
##  $ RentLowQ             : num  0.36 0.42 0.27 0.75 0.4 0.67 0.26 0.21 0.14 0.44 ...
##  $ RentMedian           : num  0.35 0.38 0.29 0.7 0.36 0.63 0.35 0.24 0.17 0.42 ...
##  $ RentHighQ            : num  0.38 0.4 0.27 0.77 0.38 0.68 0.42 0.25 0.16 0.47 ...
##  $ MedRent              : num  0.34 0.37 0.31 0.89 0.38 0.62 0.35 0.24 0.15 0.41 ...
##  $ MedRentPctHousInc    : num  0.38 0.29 0.48 0.63 0.22 0.47 0.46 0.64 0.38 0.23 ...
##  $ MedOwnCostPctInc     : num  0.46 0.32 0.39 0.51 0.51 0.59 0.44 0.59 0.13 0.27 ...
##  $ MedOwnCostPctIncNoMtg: num  0.25 0.18 0.28 0.47 0.21 0.11 0.31 0.28 0.36 0.28 ...
##  $ NumInShelters        : num  0.04 0 0 0 0 0 0 0 0.01 0 ...
##  $ NumStreet            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PctForeignBorn       : num  0.12 0.21 0.14 0.19 0.11 0.7 0.15 0.59 0.01 0.22 ...
##  $ PctBornSameState     : num  0.42 0.5 0.49 0.3 0.72 0.42 0.81 0.58 0.78 0.42 ...
##  $ PctSameHouse85       : num  0.5 0.34 0.54 0.73 0.64 0.49 0.77 0.52 0.48 0.34 ...
##   [list output truncated]

Cuántos dígitos tiene la columna state?

comunidades %>% # base de datos
  select(state) %>% #seleccionamos una o más columnas
  nrow() # número de filas en una columna

## [1] 1994

comunidades %>% 
  select(state) %>% 
  summary() # función genérica que se utiliza para producir resúmenes de resultados

##      state      
##  Min.   : 1.00  
##  1st Qu.:12.00  
##  Median :34.00  
##  Mean   :28.68  
##  3rd Qu.:42.00  
##  Max.   :56.00

Cuántos dígitos diferentes (o únicos) tiene la columna state?

comunidades %>% 
  select(state) %>% 
  unique() %>%  # devuelve un vector con los elementos/filas duplicados eliminados
  nrow()

## [1] 46

comunidades %>% 
  select(state) %>% 
  duplicated() %>% # determina qué elementos están duplicados (TRUE/FALSE)
  sum() # suma total de observaciones que son duplicados

## [1] 1948

Cuántos dígitos diferentes (o únicos) tiene la columna state?

comunidades %>% 
  select(state) %>% 
  group_by(state) %>% # agrupa por una o más variables
  summarize(n=n()) %>% # realiza una operacion por cada grupo en menos filas
  arrange(n) # ordena las filas según los valores de una o más columnas

## # A tibble: 46 x 2
##    state     n
##    <int> <int>
##  1    10     1
##  2    11     1
##  3    20     1
##  4     2     3
##  5    50     4
##  6    32     5
##  7    16     7
##  8    27     7
##  9    56     7
## 10    38     8
## # ... with 36 more rows

library(ggplot2)

comunidades %>% 
  select(state) %>% 
  group_by(state) %>% 
  summarize(NumeroObservaciones=n()) %>% 
  ggplot(aes(NumeroObservaciones)) + # crea un nuevo gráfico y lo mapea en función de número de observaciones
  geom_histogram(bins = 30) # Histogramas y polígonos de frecuencia, bins es el número de cortes

1.2.2 Contenido

Este análisis busca posibles errores en los registros individuales de la fuente de datos, e identifica qué filas específicas en una tabla contienen problemas y qué problemas sistémicos ocurren entre diferentes columnas.
En ocasiones se suele complementar aplicando reglas de negocio que verifican la exactitud e integridad de los datos.
Ejemplos:
- Números de teléfono (campo 1) sin código de área (campo 2).
- Nombres (campo 1) sin apellidos (campo 2).
También conocido como Dependency profile.

1.2.3 Relaciones

También llamado análisis cruzado o “Join profile”, consiste en identificar cómo se interrelacionan las partes de los datos, principalmente intenta determinar las similitudes o diferencias en la sintaxis, y los tipos de datos entre diferentes tablas.
Ejemplos:
- Relaciones clave entre tablas de bases de datos.
- Datos redundantes entre tablas.
- Uso de constraints.
Comprender las relaciones es crucial para reutilizar los datos.

Nota: Para realizar una análisis adecuado de relaciones, se recomienda hacer un trabajo previo de unificar en una sola (consolidar la base de datos), aquellas fuentes de datos relacionadas entre sí, o a su vez tener mucho cuidado al importar los “data sets” de manera que se conserven las relaciones importantes.

1.2.4 Patrones

El curso busca identificar problemas de datos perdidos y preparar las estructuras analíticas para la modelización de Machine Learning, en ese sentido nos concentraremos en el Análisis de Estructura o Column profile y sus tareas principales.
Las tareas de perfilamiento por columnas se pueden clasificar de la siguiente manera:

Cardinalidad: contabilizar la cantidad de elementos (finitos o infinitos) de una columna.
Patrones: búsqueda de valores que no cumplan los patrones de un valor “correcto”, es muy útil las expresiones regulares podemos para encontrar ocurrencias de valores correctos o incorrectos. Este criterio suele ser una combinación de uno o más tareas anteriores, por ejemplo: longitud + valores nulos.
Distribución: detectar si los datos siguen alguna distribución o comportamiento conocido. Unicidad: contabilizar la cantidad de entradas únicas de una determinada columna, ese criterio es útil por ejemplo para identificar valores duplicados.
Longitud: determinar que los campos tengan el formato correcto según la longitud y tipo de dato, por ejemplo: números de teléfono, código postal, etc..
Rango: calcular los valores mínimos y máximos en el campo, no solo a nivel númerico, sino también a nivel de texto.
Frecuencia: contabilizar la ocurrencia de valores particulares.
Valores nulos: contabilizar la cantidad de valores nulos o vacíos, para identificar si hay registros incompletos o con datos corruptos.
Dependencias: determinar si existen relaciones o estructuras integradas dentro de un conjunto de datos, por ejemplo: una dependencia funcional entre la columna X y la columna Y nos puede indicar que los valores de ambas columnas son iguales o están relacionados por algún criterio de negocio. El análisis de dependencia funcional se puede usar para la identificación de datos redundantes, valores mapeados y ayuda a sugerir oportunidades para la normalización de datos.

1.2.6 Preparación de datos

1.3 Ejercicio práctico

1.4 Bibliografía

2. Calidad de datos

Al final de este tema, debes ser capaz de:

Entender los conceptos de Data Quality.
Identificar problemas comúnes de datos y medir su calidad.
Aplicar técnicas de limpieza con R.

2.1 Introducción

Actualmente, vivimos en un mundo heterogéneo, conviviendo con diferentes tecnologías y operando múltiples plataformas, sensores y dispositivos que generan cada segundo un volumen impresionante de datos; esto por sí solo es un escenario difícil de manejar, más aún cuando las compañías no tienen una estrategia clara de gestión de datos, lo que ocasiona frecuentemente datos duplicados, inconsistentes, ambiguos e incompletos que no resultan útiles para la operación y toma de decisiones en las empresas.
Se dice que los datos son el petróleo del siglo 21, debido a su potencial de negocio e impacto en la sociedad, entonces la buena información es el activo más valioso y la mala información puede dañar seriamente su negocio y su credibilidad.
La calidad de datos es una percepción o evaluación de la idoneidad de los datos para servir con su propósito en un contexto dado.

El concepto de calidad es subjetivo, pues depende del criterio propio y el contexto de cada persona, para comparar un objeto con otro de su misma especie.

“Even though quality cannot be defined, you know what it is” Robert Pirsig

Entonces, la calidad de datos es una percepción o evaluación de la idoneidad de los datos para servir con su propósito en un contexto dado. Al tratarse de una percepción intuitivamente se piensa en ciertos aspectos de los datos, por lo general se tiende a pensar en que sean exactos, estén completos, actuales, etc..
Es por esto que la calidad de datos es denominada un concepto “multifacético”, ya que depende de las dimensiones que la definen.

“Garbage in, garbage out” Wilf Hey

Cuando hablamos de datos y sistemas, es clave gestionar la calidad, caso contrario nos enfrentamos a análisis “basura” que poco aportan a la operatoria de la organización, son una fuente de informes inexactos y estrategias mal concebidas.

“You can have data without information, but you cannot have information without data” Daniel Keys Moran.

Una estimación de IBM sugiere que el costo anual de los problemas de calidad de datos en Estados Unidos durante 2016 fue alrededor de USD 3.1 billones. La falta de confianza de los gerentes de negocios en la calidad de los datos se cita comúnmente entre los principales impedimentos para la toma de decisiones.

2.2 Requisitos

2.2.1 Completitud

La proporción de datos almacenados frente al potencial de “100% completo”.

2.2.2 Unicidad

Ningún dato se grabará más de una vez en función de cómo se identifique ese dato, estado de único.

2.2.3 Temporalidad

El grado en que los datos representan la realidad desde el punto requerido en la línea de tiempo.

2.2.4 Veracidad

Los datos son válidos si se ajustan a la sintaxis (formato, tipo, rango) de su definición.

2.2.5 Precisión

El grado en que los datos describen correctamente el objeto o evento del “mundo real”.

2.2.6 Consistencia

La ausencia de diferencia, cuando se comparan dos o más representaciones de una cosa con una definición.

2.3 Pruebas

2.4 Ejercicio práctico

2.5 Bibliografía

3. Imputación

Al final de este tema, el estudiante será capaz de:

Entender los conceptos de “Datos Perdidos” e “Imputación de Datos”.
Identificar los tipos de datos perdidos: MCAR, MAR y MNAR.
Utilizar de forma práctica varios métodos de imputación.
Aprender librerías de R útiles para la imputación de datos.

3.1 Introducción

Los datos perdidos (conocidos en inglés como missing values) se producen cuando no se almacena ningún valor de datos para la variable en una observación. Estos suceden frecuentemente y pueden tener un efecto significativo en las conclusiones que uno puede obtener de los datos.
En el mundo real, muchos conjuntos de datos pueden contener valores perdidos por varias razones. Estos se suelen codificar como NaN, NULL, espacios en blanco o cualquier otro marcador de posición. Si se realiza cualquier tipo de análisis sobre un conjunto de datos que tiene muchos valores perdidos, tendrá una alta probabilidad de llegar a conclusiones distorsionadas.

Estos pueden ocurrir debido a la falta de respuesta de una unidad de análisis, cuando no se proporciona información para una o más variables. Algunas variables de análisis suelen tener más probabilidades de generar una falta de respuesta que otras, generlamente información privada. Un claro ejemplo de ellon son los ingresos.
En los estudios longitudinales suele suceder la pérdida de datos por desgaste (attrition). Esta sucede cuando se repite una medición después de un cierto período de tiempo y los participantes se retiran antes de que finalice dicho estudio, faltando así una o más mediciones.
Por otro lado, menudo faltan datos en investigaciones de economía, sociología y ciencias políticas porque los gobiernos o entidades privadas eligen no informar o fallan al informar estadísticas críticas, o porque la información simplemente no está disponible.
Además, los valores perdidos pueden ser causados también por el investigador, cuando se comenten errores metodológicos o la recopilación y/o ingreso de datos se realiza de manera incorrecta.

Por estos motivos, la pérdida de datos suele tener diferentes impactos en la validez de las conclusiones de una investigación, según el tipo de dato perdido.

Finalmente, una forma de manejar este problema es deshacerse de las observaciones que tienen alguna variable con dato faltante; sin embargo, al hacerlo se corre el riesgo de perder información valiosa. Una mejor estrategia sería imputar los valores faltantes; es decir, inferir los valores faltantes a partir de la parte existente.

3.2 Tipos de datos perdidos

3.2.1 Datos perdidos estructuralmente

Los datos perdidos estructuralmente son aquellos que están perdidos por una razón lógica. En muchos casos por propio giro del negocio o del comportamiento normal de los datos que estamos manejando. Es decir, son datos que faltan porque simplemente no deberían existir.
En el ejemplo mostrado a continuación, la primera y la tercera observación tienen valores faltantes para la variable “EdadHijoMenor”. Esto se debe a que estas personas no tienen hijos.

ID	Hijos	EdadHijoMenor
1	NO	NA
2	SI	18
3	NO	NA
4	SI	13
5	SI	8

Para abordar esta situación se suele excluir a las personas con datos faltantes estructuralmente de cualquier análisis, siempre y cuando su lógica esté acorde a él.

3.2.1 Datos perdidos completamente al azar (MCAR)

En el caso de datos perdidos completamente al azar (MCAR), el supuesto es que no existe un patrón en el cómo los datos están perdidos. Así, la observación perdida podría tomar cualquier valor, y por tanto, no tiene relación probabilística con el resto de datos.

3.2.2 Datos perdidos al azar (MAR)

Para solucionar el problema de datos perdidos MCAR, se suele toma un enfoque alternativo, conocido como perdido al azar (MAR). Este enfoque supone que podemos predecir el valor que falta en función de los demás datos, dado que la observación perdida tomará un valor acorde a una distribucion de probabilidad dada por el resto de datos.

Así, el valor perdido se podría estimar en función del resto de variables. Cabe notar que la idea de predicción no significa que podamos predecir perfectamente una relación,y todo lo que se requiere es una relación probabilística (es decir, que tenemos una probabilidad mejor que aleatoria de predecir el verdadero valor de los datos faltantes).

3.2.3 Datos perdidos no al azar (MNAR)

Cuando no podemos sacar conclusiones sobre el posible valor de los datos perdidos, y estos faltan por motivos que simplemente no entendemos se conoce como datos perdidos no aleatorios (NMAR) o también como datos perdidos no almacenables.
Los datos perdidos estructurales suelen ser un caso especial de datos que faltan no al azar. Sin embargo, estos necesitan de una distinción importante. Los datos perdidos estructuralmente son fáciles de analizar, mientras que otras formas de pérdida de datos no aleatorios son altamente problemáticos.
Cuando los datos están perdidos de forma no aleatoria, significa que no podemos utilizar ninguno de los métodos estándar para tratar los datos faltantes (por ejemplo, imputación o algoritmos diseñados específicamente para valores perdidos), ya que cualquier cálculo estándar dará la respuesta incorrecta.
En estos casos, los datos perdidos no son dignos de mención, pues normalmente no podemos reproducir su comportamiento, ni identificar la fuente de datos, o a su vez corresponden a datos recopilados en diferentes períodos de tiempo no comparables entre sí, y como resultado, juntar estos datos en un mismo análisis podría generar conclusiones incorrectas o engañosas.

Finalmente, tratarlos como datos perdidos al azar también sería inapropiado.

3.3 Métodos de imputación

Una vez conocidos los principales tipos de datos perdidos, nos vamos a concentrar en revisar algunas vías o métodos de imputación de datos para datasets transversales, sin profundizar en series de tiempo (dado que este tipo de datos requiere otro tratamiento).

3.3.1 No hacer nada

Este método es el más fácil y rápido. Al no hacer nada dejaremos que el algoritmo que entrenemos maneje los datos perdidos.
Algunos algoritmos pueden tener en cuenta los valores perdidos y aprender los mejores valores de imputación (o incluso tratarlos como una categoría aislada) para los datos faltantes en base a la función de pérdida calculada durante el entrenamiento (tales como los modelos basados en árboles). Otros algoritmos tienen la opción de simplemente ignorar estos casos (tales como los modelos de regresión). Dependiendo de la librería utilizada, algunos algoritmos simplemente no se ejecutarán y nos solicitarán revisar los valores perdidos.

3.3.2 Media y mediana

Este método solo puede ser utilizado con datos numéricos.
Para imputar con la media o la mediana debemos seguir los pasos descritos a continuación:
- Identificar valores perdidos.
- Calcular la media / mediana de los valores NO perdidos en una columna.
- Reemplazar los valores perdidos con la media/mediana en la misma columna.
Esta imputación dependerá de la simetría de la distribución de los datos existentes. Si la distribución de los datos es simétrica, la media y la mediana son iguales, entonces se puede usar cualquiera de los dos valores. Si la distribución es sesgada hacia la derecha, la media será mayor a la mediana, y es preferible utilizar esta última para no alterar la distribución de los datos. Finalmente, si la distribución es sesgada hacia la izquierda, la media será menor a la mediana, y por tanto es la medida más conveniente.

Sus ventajas son:
- Fácil y rápido.
- Funciona bien con datasets numéricos pequeños.
Sus desventajas son:
- No tiene en cuenta las correlaciones entre las variables. Sólo funciona a nivel de columna.
- No se recomienda usarlo con variables categóricas ya que ofrece malos resultados trabajando con variables categóricas codificadas.
- No es muy exacto y le falta precisión con datos dispersos.
- No tiene en cuenta la incertidumbre en las imputaciones.

3.3.3 Moda

Un método estadístico similar al anterior, útil para imputar datos perdidos es usar los valores más frecuentes (o moda). Esta técnica puede trabajar con variables categóricas y consiste en:
- Identificar valores perdidos.
- Calcular el valor más frecuente por columna.
- Reemplazar los valores perdidos con el valor más frecuente dentro de cada columna.
Sus ventajas son:
- Funciona bien con variables categóricas.
Sus desventajas son:
- No factoriza las correlaciones entre variables.
- Puede introducir sesgo en los datos.

3.3.4 Ceros o constantes

Este es un método muy sencillo. Tal como sugiere su nombre, reemplaza los valores faltantes con cero o cualquier valor constante que especifiquemos.
Este métotodo se utiliza cuando el valor perdido en realidad corresponde a un cero por ejemplo.

3.3.5 Multivariante

En la actualidad existen varios métodos de imputación basados en aprendizaje supervisado y no supervisado. Los modelos no supervisados son el fundamento para muchos otros métodos, principalmente clustering.
Estos métodos incluyen:
- K-Vecinos más cercanos
- Regresión
- Ecuaciones encadenadas
- Etc.
En el caso de los K-Vecinos más cercanos por ejemplo, el principio fundamental es encontrar un número predefinido de ejemplos de entrenamiento más cercanos a un punto, y predecir la etiqueta que corresponda a partir de estos.
Algunas de sus ventajas son:
- Puede ser mucho más preciso que métodos simples como la media, la mediana o la moda.
Algunas de sus desventajas son:
- Suele ser computacionalmente costosos.
- Pueden generar multicolinealidad.
- Algunos de los métodos usados pueden ser sensibles a valores atípicos en los datos.

3.4 Ejercicio práctico

A continuación revisaremos un ejemplo de datos perdidos en R, usando herramientas compatibles con el tidyverse.

# Instalación de librerías en desarrollo
# remotes::install_github("njtierney/naniar")
# Carga de librerías
library(dplyr)
library(visdat)
library(naniar)
library(simputation)

# Carga de datos
mis_dataset <- read.csv(url("https://raw.githubusercontent.com/dataoptimal/posts/master/data%20cleaning%20with%20R%20and%20the%20tidyverse/telecom.csv"))
head(mis_dataset)

##   customerID MonthlyCharges TotalCharges    PaymentMethod Churn
## 1 7590-VHVEG          29.85        109.9 Electronic check   yes
## 2 5575-GNVDE          56.95           na     Mailed check   yes
## 3 3668-QPYBK             NA       108.15               --   yes
## 4 7795-CFOCW          42.30      1840.75    Bank transfer    no
## 5 9237-HQITU          70.70         <NA> Electronic check    no
## 6 9305-CDSKC            NaN        820.5               --   yes

# Revisión de la estructura de los datos
mis_dataset %>% glimpse()

## Rows: 10
## Columns: 5
## $ customerID     <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 9237-HQ~
## $ MonthlyCharges <dbl> 29.85, 56.95, NA, 42.30, 70.70, NaN, 89.10, NA, 104.80,~
## $ TotalCharges   <fct> 109.9, na, 108.15, 1840.75, NA, 820.5, 1949.4, N/A, 304~
## $ PaymentMethod  <fct> Electronic check, Mailed check, --, Bank transfer, Elec~
## $ Churn          <fct> yes, yes, yes, no, no, yes, no, yes, no, no

# Corrección en la carga
mis_dataset <- read.csv(url("https://raw.githubusercontent.com/dataoptimal/posts/master/data%20cleaning%20with%20R%20and%20the%20tidyverse/telecom.csv"), na.strings = c("NaN","NA","N/A","na","--"))
head(mis_dataset)

##   customerID MonthlyCharges TotalCharges    PaymentMethod Churn
## 1 7590-VHVEG          29.85       109.90 Electronic check   yes
## 2 5575-GNVDE          56.95           NA     Mailed check   yes
## 3 3668-QPYBK             NA       108.15             <NA>   yes
## 4 7795-CFOCW          42.30      1840.75    Bank transfer    no
## 5 9237-HQITU          70.70           NA Electronic check    no
## 6 9305-CDSKC            NaN       820.50             <NA>   yes

# Revisión de formato
mis_dataset %>% glimpse()

## Rows: 10
## Columns: 5
## $ customerID     <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 9237-HQ~
## $ MonthlyCharges <dbl> 29.85, 56.95, NA, 42.30, 70.70, NaN, 89.10, NA, 104.80,~
## $ TotalCharges   <dbl> 109.90, NA, 108.15, 1840.75, NA, 820.50, 1949.40, NA, 3~
## $ PaymentMethod  <fct> Electronic check, Mailed check, NA, Bank transfer, Elec~
## $ Churn          <fct> yes, yes, yes, no, no, yes, no, yes, no, no

# Inspección gráfica de datos perdidos
mis_dataset %>% vis_miss()

# Casos perdidos por variable
mis_dataset %>% gg_miss_var()

# Imputación simple
mis_dataset <- mis_dataset %>% 
  mutate(TotalChargesImpMean = replace(TotalCharges,is.na(TotalCharges),mean(TotalCharges,na.rm=T)),
         TotalChargesImpMedian = replace(TotalCharges,is.na(TotalCharges),median(TotalCharges,na.rm=T)))
mis_dataset %>% 
  select(TotalCharges,
         TotalChargesImpMean,
         TotalChargesImpMedian) %>% 
  head()

##   TotalCharges TotalChargesImpMean TotalChargesImpMedian
## 1       109.90             109.900                109.90
## 2           NA            1175.671                820.50
## 3       108.15             108.150                108.15
## 4      1840.75            1840.750               1840.75
## 5           NA            1175.671                820.50
## 6       820.50             820.500                820.50

# Imputación multivariada
mis_dataset <- mis_dataset %>% 
  impute_lm(TotalCharges ~ MonthlyCharges + Churn)
mis_dataset %>% 
  select(TotalCharges, TotalChargesImpMean, TotalChargesImpMedian) %>% 
  head()

##   TotalCharges TotalChargesImpMean TotalChargesImpMedian
## 1     109.9000             109.900                109.90
## 2     828.0137            1175.671                820.50
## 3     108.1500             108.150                108.15
## 4    1840.7500            1840.750               1840.75
## 5    1748.1025            1175.671                820.50
## 6     820.5000             820.500                820.50

3.5 Bibliografía

4. Ingeniería de variables

Al final de este capítulo, el estudiante será capaz de:

Entender los conceptos principales de la ingeniería de variables.
Utilizar las principales implementaciones de ingeniería de variables.
Utilizar herramientas del tidyverse para llevar a cabo ingeniería de variables.

4.1 Introducción

La ingeniería de variables consta de extensa teoría y por ello solo presentaremos sus conceptos básicos y algunas herramientas prácticas del tidyverse.
Los datos en el mundo real pueden estar muy desordenados y caóticos, sin importar si es una base de datos relacional, un archivo de texto o cualquier otra fuente de datos.
A pesar de que los datos generalmente se construyen como tablas donde cada fila (unidad de análisis) tiene sus propios valores correspondientes a una columna dada (característica), los datos pueden ser difíciles de entender y procesar. Para facilitar la lectura de los datos para los modelos de aprendizaje automático y aumentar su rendimiento, podemos realizar ingeniería de Variables.
En general, se puede pensar en la limpieza de datos como un proceso de sustracción y en la ingeniería de variables como un proceso de adición. Esta es a menudo una de las tareas más valiosas que un científico de datos puede realizar para mejorar el rendimiento de sus modelos.
En conclusión, podríamos definir a la ingeniería de variables como el proceso de creación y tratamiento de nuevas variables de entrada a partir de las ya existentes en una fuente de datos.

4.2 Enfoques

En el proceso de ingeniería de variables se puede aislar y resaltar información clave, ayudando a que los algoritmos se “enfoquen” en lo importante. Es posible además, aportar con la propia experiencia de dominio. Incluso, una vez que hayamos entendido el “vocabulario” de la ingeniería de Variables, se puede incorporar la experiencia en el dominio de otros miembros del equipo.

Sus enfoques son los siguientes:

Manual
- Es simple.
- Está basado en la experiencia de negocio.
- Es tedioso y propenso a error humano.
- Para cada conjunto de datos debemos iniciar de cero.
- Está limitado en tiempo y velocidad del ejecutante.
Automático
- Es un enfoque re-usable.
- Permite el procesamiento masivo de variables.
- Filtra variables por cada etiqueta basada en el tiempo de corte.
- Habilita construir mejores modelos predictivos en una fracción del tiempo.
- Corre el riesgo de construir variables contraintuitivas.

4.2.1 Variables primitivas

Cuando trabajamos realizando análisis de datos, muchas de las operaciones utilizadas para crear variables se repiten en los conjuntos de datos, y una vez que nos damos cuenta de que estas operaciones no dependen de los datos subyacentes es mejor abstraer este proceso en un marco que puede crear variables para cualquier base.
Esta es la principal idea detrás de la ingeniería de variables automatizada, donde podemos aplicar los mismos bloques de código básicos, llamados primitivos de variables, a diferentes conjuntos de datos para construir variables predictoras.
Por ejemplo, una operación concurrente en un modelo de datos es el cálculo de valores máximos. A este proceso entonces lo podemos abstraer como una variable primitiva max que puede ser aplicada a diferentes propósitos y conjuntos de datos.

4.2.2 Tiempo de corte

Este especifica el último punto en el tiempo en que se pueden usar los datos de una fila para el cálculo de una entidad; cualquier dato posterior a este punto en el tiempo se filtrará antes de calcular las variables.
Por ejemplo, consideremos un conjunto de datos de transacciones de clientes (identificadas con fecha y hora) donde queremos predecir si los clientes 1, 2 y 3 van a gastar $500 entre las 04:00h y 23:59h del 2020-06-01. Cuando definimos las variables para este problema, necesitamos asegurarnos que no hay datos antes de las 04:00h que sean usados en los cálculos.

4.2.3 Conocimiento de dominio

A menudo se puede diseñar funciones informativas aprovechando experiencia propia (o del equipo de trabajo) sobre un dominio específico, tomándolo como premisa. Esto se hace con el fin de aislar información específica en el modelamiento.
Por ejemplo, si estuviesemos trabajando con un conjunto de datos del mercado inmobiliario (propiedades inmobiliarias y precios), y uno de los especialistas de negocio recordó que la crisis de la vivienda en Europa ocurrió en el mismo periodo de tiempo, este conocimiento puede disparar algunas alertas al modelo. Seguramente el especialista nos advierte que los precios de inmuebles estarán afectados durante la crisis, por lo que podemos crear una variable indicadora para las transacciones realizadas durante ese período.
Las variables indicadoras son variables binarias y nos indican si una observación cumple con una determinada condición o no, siendo muy útiles para aislar características clave de los datos.
Pese a que el conocimiento del negocio es muy amplio y abierto, y aporta mucho; dependerá de la experiencia en el negocio con que cuentan los integrantes del equipo. De igual manera, en algún momento se agotarán las ideas y se se podrá recurrir a otras herramientas técnicas de la ingeniería de variables.

4.2.4 Interacción de variables

Esta técnica consiste en evaluar la posibilidad de combinar dos o más variables que tengan sentido, identificando su interacción en el negocio. Los términos en los que interactúan estas variables pueden ser operaciones de tipo suma, resta y multiplicación.
Los términos de interacción, son las condiciones que nos permiten modelar las relaciones entre variables cuando los efectos de una variable para alcanzar un objetivo es influenciada por otra.
Un consejo general es preguntarse por cada conjunto de variables¿Podría combinar esta información de alguna manera que sea aún más útil?

4.2.5 Variables categóricas dispersas

Las clases dispersas en variables categóricas son aquellas que tienen muy pocas observaciones. Estas pueden ser problemáticas para ciertos algoritmos de aprendizaje automático, causando que los modelos se ajusten en exceso (sobreajuste).

Así, algunas consideraciones para agrupar clases dispersas son:

No hay una regla formal de cuántas observaciones se necesita por cada clase.
Esto depende del tamaño del conjunto de datos.
Se debe tomar en cuenta las particularidades del negocio.
Como sugerencia general (no es una regla), se puede combinar clases hasta que cada una tenga al menos 50 observaciones (aprox.)
Después de combinar clases dispersas se tendrá menos clases únicas, pero cada una tendrá más observaciones.

Categorías dispersas:

Categorías reagrupadas:

4.2.6 Variables dummy

La mayoría de los algoritmos de aprendizaje automático no pueden manejar directamente las variables categóricas, porque la máquina no puede leer directamente valores de texto. Por lo tanto, necesitamos crear variables ficticias o dummy* para las variables categóricas de nuestro conjunto de datos.
Las variables dummy son un conjunto de variables binarias (0 o 1) que representan una sola clase de una variables categórica. La información que representa es exactamente la misma, pero esta representación numérica le permite ser leída por el ordenador.

4.2.7 Variables inutilizadas o redundantes

Normalmente este tipo de variables se descarta y no se las utiliza en el modelo analítico.

Variables no utilizadas: Son aquellas que no representan datos de negocio y no tienen sentido lógico para entrar a los modelos que entrenemos, por ejemplo:
- Identificadores.
- Variables no disponibles en el momento de ejecutar el algoritmo.
- Descripciones de texto.
Variables redundantes: Estas existen cuando más de una variable representa el mismo concepto y valor.

4.2.8 Problemas de variables categóricas

Prácticamente ningún algoritmo de aprendizaje automático puede trabajar con datos categóricos directamente. Estos requieren que todas las variables de entrada y las variables de salida sean numéricas.
Por ello, debemos transformar las variables categóricas a una forma numérica. Si la variable categórica es una variable de salida, es posible que también se necesite convertir las predicciones del modelo nuevamente en una forma categórica para presentarlas o usarlas en alguna aplicación.

Así, existen 2 tipos principales de conversiones numéricas:

Integer Encoding
One-Hot Encoding

Integer Encoding

Consiste en asignar a cada valor único de categoría un valor entero, por ejemplo:

Color: Azul, Rojo, Verde

Categoría	Valor
Azul	1
Rojo	2
Verde	3

Para algunas variables (ordinales), esto puede ser suficiente. Los valores enteros tienen una relación natural ordenada entre sí y los algoritmos de aprendizaje automático pueden ser capaces de comprender y aprovechar esta relación.

One-Hot Encoding

Se utiliza cuando no existe una relación ordinal. El usar una codificación numérica y permitir que el modelo asuma un orden natural entre categorías puede resultar en un desempeño pobre o resultados inesperados.
En este caso, se puede aplicar una codificación instantánea o one-hot a la representación de enteros. Aquí se elimina la variable codificada de entero y se agrega una nueva variable binaria para cada valor entero único.
Color: Azul, Rojo, Verde

Azul	Rojo	Verde
1	0	0
0	1	0
0	0	1

4.3 Ejercicio práctico

4.4 Bibliografía

5. Herramientas de R para perfilar los datos

Al final de este tema, el estudiante será capaz de:

Conocer las herramientas de R que nos facilitan el perfilamiento de datos.
Practicar con stringr y lubridate la limpieza de datos de texto y fechas.
Presentar informes sobre los datos usando R.

Acontinuación se presenta una lista de software y documentos relacionados con el análisis de datos exploratorio automatizado, que incluye:

Exploración y visualización rápida de datos.
Analítica aumentada.
Recomendación de visualización y otras herramientas que aceleran la exploración de datos (exploración visual en particular).

Información sobre las librerías para auto análisis exploratorio [aquí]https://github.com/mstaniak/autoEDA-resources)

5.1 autoEDA - Automated exploratory data analysis

autoEDA tiene como objetivo automatizar el análisis de datos exploratorios de manera univariante o bivariada. Tiene la capacidad de generar gráficos creados con la biblioteca ggplot2 y temas inspirados enRColorBrewer.
La capacidad principal consiste en limpiar y preprocesar de manera impecable sus datos para que los gráficos se muestren de manera adecuada.

#install.packages('devtools')

library(devtools)

#devtools::install_github("XanderHorn/autoEDA")

library(autoEDA)

5.1.1 Ejemplo para análisis univariado:

overview_1 <-  autoEDA(x = iris)

## autoEDA | Setting color theme 
## autoEDA | Removing constant features 
## autoEDA | 0 constant features removed 
## autoEDA | 0 zero spread features removed 
## autoEDA | Removing features containing majority missing values 
## autoEDA | 0 majority missing features removed 
## autoEDA | Cleaning data 
## autoEDA | Correcting sparse categorical feature levels 
## autoEDA | Performing univariate analysis 
## autoEDA | Visualizing data

overview_1

##        Feature Observations FeatureClass FeatureType PercentageMissing
## 1 Sepal.Length          150      numeric  Continuous                 0
## 2  Sepal.Width          150      numeric  Continuous                 0
## 3 Petal.Length          150      numeric  Continuous                 0
## 4  Petal.Width          150      numeric  Continuous                 0
## 5      Species          150    character Categorical                 0
##   PercentageUnique ConstantFeature ZeroSpreadFeature LowerOutliers
## 1            23.33              No                No             0
## 2            15.33              No                No             1
## 3            28.67              No                No             0
## 4            14.67              No                No             0
## 5             2.00              No                No             0
##   UpperOutliers ImputationValue MinValue FirstQuartile Median Mean   Mode
## 1             0             5.8      4.3           5.1   5.80 5.84      5
## 2             3               3      2.0           2.8   3.00 3.06      3
## 3             0            4.35      1.0           1.6   4.35 3.76    1.4
## 4             0             1.3      0.1           0.3   1.30 1.20    0.2
## 5             0          SETOSA      0.0           0.0   0.00 0.00 SETOSA
##   ThirdQuartile MaxValue LowerOutlierValue UpperOutlierValue
## 1           6.4      7.9              3.15              8.35
## 2           3.3      4.4              2.05              4.05
## 3           5.1      6.9             -3.65             10.35
## 4           1.8      2.5             -1.95              4.05
## 5           0.0      0.0              0.00              0.00

5.1.2 Ejemplo para regresión bivariada::

overview_2 <-  autoEDA(x = iris,
                     y = "Sepal.Length")

## autoEDA | Setting color theme 
## autoEDA | Removing constant features 
## autoEDA | 0 constant features removed 
## autoEDA | Removing zero spread features 
## autoEDA | 0 zero spread features removed 
## autoEDA | Removing features containing majority missing values 
## autoEDA | 0 majority missing features removed 
## autoEDA | Cleaning data 
## autoEDA | Correcting sparse categorical feature levels 
## autoEDA | Sorting features 
## autoEDA | Regression outcome detected 
## autoEDA | Calculating feature predictive power 
## autoEDA | Visualizing data

overview_2

##        Feature Observations FeatureClass FeatureType PercentageMissing
## 1 Petal.Length          150      numeric  Continuous                 0
## 2  Petal.Width          150      numeric  Continuous                 0
## 3 Sepal.Length          150      numeric  Continuous                 0
## 4  Sepal.Width          150      numeric  Continuous                 0
## 5      Species          150    character Categorical                 0
##   PercentageUnique ConstantFeature ZeroSpreadFeature LowerOutliers
## 1            28.67              No                No             0
## 2            14.67              No                No             0
## 3            23.33              No                No             0
## 4            15.33              No                No             1
## 5             2.00              No                No             0
##   UpperOutliers ImputationValue MinValue FirstQuartile Median Mean   Mode
## 1             0            4.35      1.0           1.6   4.35 3.76    1.4
## 2             0             1.3      0.1           0.3   1.30 1.20    0.2
## 3             0             5.8      4.3           5.1   5.80 5.84      5
## 4             3               3      2.0           2.8   3.00 3.06      3
## 5             0          SETOSA      0.0           0.0   0.00 0.00 SETOSA
##   ThirdQuartile MaxValue LowerOutlierValue UpperOutlierValue
## 1           5.1      6.9             -3.65             10.35
## 2           1.8      2.5             -1.95              4.05
## 3           6.4      7.9              3.15              8.35
## 4           3.3      4.4              2.05              4.05
## 5           0.0      0.0              0.00              0.00
##   PredictivePowerPercentage PredictivePower
## 1                        87            High
## 2                        82            High
## 3                         0             Low
## 4                        12             Low
## 5                        78            High

5.1.3 Ejemplo para clasificación bivariada:

overview_3 <-  autoEDA(x = iris,
                     y = "Species")

## autoEDA | Setting color theme 
## autoEDA | Removing constant features 
## autoEDA | 0 constant features removed 
## autoEDA | Removing zero spread features 
## autoEDA | 0 zero spread features removed 
## autoEDA | Removing features containing majority missing values 
## autoEDA | 0 majority missing features removed 
## autoEDA | Cleaning data 
## autoEDA | Correcting sparse categorical feature levels 
## autoEDA | Sorting features 
## autoEDA | Multi-class classification outcome detected 
## autoEDA | Calculating feature predictive power 
## autoEDA | Visualizing data

overview_3

##        Feature Observations FeatureClass FeatureType PercentageMissing
## 1 Petal.Length          150      numeric  Continuous                 0
## 2  Petal.Width          150      numeric  Continuous                 0
## 3 Sepal.Length          150      numeric  Continuous                 0
## 4  Sepal.Width          150      numeric  Continuous                 0
## 5      Species          150    character Categorical                 0
##   PercentageUnique ConstantFeature ZeroSpreadFeature LowerOutliers
## 1            28.67              No                No             0
## 2            14.67              No                No             0
## 3            23.33              No                No             0
## 4            15.33              No                No             1
## 5             2.00              No                No             0
##   UpperOutliers ImputationValue MinValue FirstQuartile Median Mean   Mode
## 1             0            4.35      1.0           1.6   4.35 3.76    1.4
## 2             0             1.3      0.1           0.3   1.30 1.20    0.2
## 3             0             5.8      4.3           5.1   5.80 5.84      5
## 4             3               3      2.0           2.8   3.00 3.06      3
## 5             0          SETOSA      0.0           0.0   0.00 0.00 SETOSA
##   ThirdQuartile MaxValue LowerOutlierValue UpperOutlierValue
## 1           5.1      6.9             -3.65             10.35
## 2           1.8      2.5             -1.95              4.05
## 3           6.4      7.9              3.15              8.35
## 4           3.3      4.4              2.05              4.05
## 5           0.0      0.0              0.00              0.00
##   PredictivePowerPercentage PredictivePower
## 1                        86            High
## 2                        88            High
## 3                        46          Medium
## 4                        24             Low
## 5                         0             Low

5.2 arsenal - Statistical reporting easy

El objetivo de la librería arsenal es facilitar la elaboración de informes estadísticos.

#install.packages("arsenal")
library(arsenal)

5.2.1 Resumir variables a partir de categórica (s)

tableby() es una función para resumir fácilmente un conjunto de variables independientes mediante una o más variables categóricas.

#install.packages("knitr")
require(knitr)

data(mockstudy) #cargamos los datos
dim(mockstudy)  #número de filas y columnas

## [1] 1499   14

str(mockstudy) #vistazo rápido de los datos

## 'data.frame':    1499 obs. of  14 variables:
##  $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
##  $ age        : int  67 74 50 71 69 56 50 57 51 63 ...
##   ..- attr(*, "label")= chr "Age in Years"
##  $ arm        : chr  "F: FOLFOX" "A: IFL" "A: IFL" "G: IROX" ...
##   ..- attr(*, "label")= chr "Treatment Arm"
##  $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
##  $ race       : chr  "Caucasian" "Caucasian" "Caucasian" "Caucasian" ...
##   ..- attr(*, "label")= chr "Race"
##  $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
##  $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
##  $ bmi        : num  25.1 19.5 NA 29.4 26.4 ...
##   ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
##  $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
##  $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
##  $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
##  $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...

Para crear una tabla para arm usamos una declaración de fórmula para especificar las variables que desea resumir. El siguiente ejemplo utiliza la edad (una variable continua) y el sexo (una variable categórica).

tab1 <- tableby(arm ~ sex + age, data=mockstudy)
tab1

## tableby Object
## 
## Function Call:
## tableby(formula = arm ~ sex + age, data = mockstudy)
## 
## Variable(s):
## arm ~ sex, age

summary(tab1)

## 
## 
## |                            | A: IFL (N=428)  | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499)  | p value|
## |:---------------------------|:---------------:|:-----------------:|:---------------:|:---------------:|-------:|
## |**sex**                     |                 |                   |                 |                 |   0.190|
## |&nbsp;&nbsp;&nbsp;Male      |   277 (64.7%)   |    411 (59.5%)    |   228 (60.0%)   |   916 (61.1%)   |        |
## |&nbsp;&nbsp;&nbsp;Female    |   151 (35.3%)   |    280 (40.5%)    |   152 (40.0%)   |   583 (38.9%)   |        |
## |**Age in Years**            |                 |                   |                 |                 |   0.614|
## |&nbsp;&nbsp;&nbsp;Mean (SD) | 59.673 (11.365) |  60.301 (11.632)  | 59.763 (11.499) | 59.985 (11.519) |        |
## |&nbsp;&nbsp;&nbsp;Range     | 27.000 - 88.000 |  19.000 - 88.000  | 26.000 - 85.000 | 19.000 - 88.000 |        |

tab2 <- as.data.frame(tab1)
tab2

##   group.term   group.label strata.term variable     term        label
## 1        arm Treatment Arm                  sex      sex          sex
## 2        arm Treatment Arm                  sex countpct         Male
## 3        arm Treatment Arm                  sex countpct       Female
## 4        arm Treatment Arm                  age      age Age in Years
## 5        arm Treatment Arm                  age   meansd    Mean (SD)
## 6        arm Treatment Arm                  age    range        Range
##   variable.type              A: IFL           F: FOLFOX            G: IROX
## 1   categorical                                                           
## 2   categorical 277.00000, 64.71963 411.00000, 59.47902            228, 60
## 3   categorical 151.00000, 35.28037 280.00000, 40.52098            152, 40
## 4       numeric                                                           
## 5       numeric  59.67290, 11.36454  60.30101, 11.63225 59.76316, 11.49930
## 6       numeric              27, 88              19, 88             26, 85
##                Total                       test   p.value
## 1                    Pearson's Chi-squared test 0.1904388
## 2  916.0000, 61.1074 Pearson's Chi-squared test 0.1904388
## 3  583.0000, 38.8926 Pearson's Chi-squared test 0.1904388
## 4                            Linear Model ANOVA 0.6143859
## 5 59.98532, 11.51877         Linear Model ANOVA 0.6143859
## 6             19, 88         Linear Model ANOVA 0.6143859

summary(tab1)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
sex					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)
Age in Years					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000

Para más información aquí.

5.2.2 Resumir variables por puntos de tiempo

paired() es una función para resumir fácilmente un conjunto de variables independientes en dos puntos de tiempo.

dat <- data.frame(
  tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
  id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6),
  Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"),
  Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")),
  Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA),
  Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")),
  Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE),
  Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4),
  stringsAsFactors = FALSE
)


p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE)

summary(p)

{style="max-height: 300px;",results="asis"} ## ## ## | | Time Point 1 (N=4) | Time Point 2 (N=4) | Difference (N=4) | p value| ## |:---------------------------|:-----------------------:|:-----------------------:|:----------------:|-------:| ## |**Cat** | | | | 1.000| ## |   A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | | ## |   B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | | ## |**Fac** | | | | 0.261| ## |   A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | | ## |   B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | | ## |   C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | | ## |**Num** | | | | 0.391| ## |   Mean (SD) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | | ## |   Range | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | | ## |**Ord** | | | | 0.174| ## |   I | 2 (50.0%) | 0 (0.0%) | 2 (100.0%) | | ## |   II | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | | ## |   III | 1 (25.0%) | 3 (75.0%) | 0 (0.0%) | | ## |**Lgl** | | | | 1.000| ## |   FALSE | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | | ## |   TRUE | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | | ## |**Dat** | | | | 0.182| ## |   Median | 2018-05-03 | 2018-05-04 | 0.500 | | ## |   Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 | |

summary(p)

	Time Point 1 (N=4)	Time Point 2 (N=4)	Difference (N=4)	p value
Cat				1.000
A	2 (50.0%)	2 (50.0%)	1 (50.0%)
B	2 (50.0%)	2 (50.0%)	1 (50.0%)
Fac				0.261
A	2 (50.0%)	1 (25.0%)	2 (100.0%)
B	1 (25.0%)	2 (50.0%)	1 (100.0%)
C	1 (25.0%)	1 (25.0%)	1 (100.0%)
Num				0.391
Mean (SD)	2.750 (1.258)	3.250 (0.957)	0.500 (1.000)
Range	1.000 - 4.000	2.000 - 4.000	-1.000 - 1.000
Ord				0.174
I	2 (50.0%)	0 (0.0%)	2 (100.0%)
II	1 (25.0%)	1 (25.0%)	1 (100.0%)
III	1 (25.0%)	3 (75.0%)	0 (0.0%)
Lgl				1.000
FALSE	2 (50.0%)	1 (25.0%)	2 (100.0%)
TRUE	2 (50.0%)	3 (75.0%)	1 (50.0%)
Dat				0.182
Median	2018-05-03	2018-05-04	0.500
Range	2018-05-02 - 2018-05-06	2018-05-02 - 2018-05-07	0.000 - 1.000

Para más información aquí

5.2.3 Ajustar y resumir modelos

modelsum() es una función para ajustar y resumir modelos para cada variable independiente con una o más variables de respuesta, con opciones para ajustar las covariables de cada modelo.

tab3 <- modelsum(bmi ~ sex + age, data=mockstudy)

summary(tab3, text=TRUE)

## 
## 
## |             |estimate |std.error |p.value |adj.r.squared |Nmiss |
## |:------------|:--------|:---------|:-------|:-------------|:-----|
## |(Intercept)  |27.491   |0.181     |< 0.001 |0.004         |33    |
## |sex Female   |-0.731   |0.290     |0.012   |              |      |
## |(Intercept)  |26.424   |0.752     |< 0.001 |0.000         |33    |
## |Age in Years |0.013    |0.012     |0.290   |              |      |

summary(tab3)

	estimate	std.error	p.value	adj.r.squared	Nmiss
(Intercept)	27.491	0.181	< 0.001	0.004	33
sex Female	-0.731	0.290	0.012
(Intercept)	26.424	0.752	< 0.001	0.000	33
Age in Years	0.013	0.012	0.290

Para más información aquí

5.2.4 Comparar dos tablas

comparedf() compara dos tablas y reporta cualquier diferencia entre ellas.

df1 <- data.frame(id = paste0("person", 1:3),
                  a = c("a", "b", "c"),
                  b = c(1, 3, 4),
                  c = c("f", "e", "d"),
                  row.names = paste0("rn", 1:3),
                  stringsAsFactors = FALSE)
df2 <- data.frame(id = paste0("person", 3:1),
                  a = c("c", "b", "a"),
                  b = c(1, 3, 4),
                  d = paste0("rn", 1:3),
                  row.names = paste0("rn", c(1,3,2)),
                  stringsAsFactors = FALSE)

comparedf(df1, df2)

## Compare Object
## 
## Function Call: 
## comparedf(x = df1, y = df2)
## 
## Shared: 3 non-by variables and 3 observations.
## Not shared: 2 variables and 0 observations.
## 
## Differences found in 2/3 variables compared.
## 0 variables compared have non-identical attributes.

summary(comparedf(df1, df2))

## 
## 
## Table: Summary of data.frames
## 
## version   arg    ncol   nrow
## --------  ----  -----  -----
## x         df1       4      3
## y         df2       4      3
## 
## 
## 
## Table: Summary of overall comparison
## 
## statistic                                                      value
## ------------------------------------------------------------  ------
## Number of by-variables                                             0
## Number of non-by variables in common                               3
## Number of variables compared                                       3
## Number of variables in x but not y                                 1
## Number of variables in y but not x                                 1
## Number of variables compared with some values unequal              2
## Number of variables compared with all values equal                 1
## Number of observations in common                                   3
## Number of observations in x but not y                              0
## Number of observations in y but not x                              0
## Number of observations with some compared variables unequal        2
## Number of observations with all compared variables equal           1
## Number of values unequal                                           4
## 
## 
## 
## Table: Variables not shared
## 
## version   variable    position  class     
## --------  ---------  ---------  ----------
## x         c                  4  character 
## y         d                  4  character 
## 
## 
## 
## Table: Other variables not compared
## 
## |                                |
## |:-------------------------------|
## |No other variables not compared |
## 
## 
## 
## Table: Observations not shared
## 
## |                           |
## |:--------------------------|
## |No observations not shared |
## 
## 
## 
## Table: Differences detected by variable
## 
## var.x   var.y     n   NAs
## ------  ------  ---  ----
## id      id        2     0
## a       a         2     0
## b       b         0     0
## 
## 
## 
## Table: Differences detected
## 
## var.x   var.y    ..row.names..  values.x   values.y    row.x   row.y
## ------  ------  --------------  ---------  ---------  ------  ------
## id      id                   1  person1    person3         1       1
## id      id                   3  person3    person1         3       3
## a       a                    1  a          c               1       1
## a       a                    3  c          a               3       3
## 
## 
## 
## Table: Non-identical attributes
## 
## |                            |
## |:---------------------------|
## |No non-identical attributes |

Para más información aquí

5.3 DataExplorer - Exploratory Data Analysis (EDA)

EDA, es la fase inicial e importante del análisis de datos / modelado predictivo. Durante este proceso, los analistas / modeladores echarán un primer vistazo a los datos y, por lo tanto, generarán hipótesis relevantes y decidirán los próximos pasos.
Sin embargo, el proceso de EDA puede resultar complicado a veces. Este paquete R tiene como objetivo automatizar la mayor parte del manejo y visualización de datos, de modo que los usuarios puedan concentrarse en estudiar los datos y extraer conocimientos.
Cuando ejecutemos el comando create_report(base_de_datos), la función create_report genera un reporte html en el directorio de trabajo con el nombre: report.html
Importante! Fijar el direcorio de trabajo ubicándonos en Files en la carpeta en la que deseamos tener el reporte y luego usando la función setwd().

#install.packages("DataExplorer")

library(DataExplorer)
data(airquality) # cargamos la base de datos

#create_report(airquality)

El reporte se ve así:

5.4 janitor - limpiar datos sucios

El paquete janitor (conserje) es un paquete de R que tiene funciones simples para examinar y limpiar datos sucios. Fue construido teniendo en cuenta a los usuarios principiantes e intermedios de R y está optimizado para que sea fácil de usar.
Las principales funciones de janitor:
- Formatear los nombres de las columnas de la base de datos.
- Aislar registros duplicados
- Proporcionar tabulaciones rápidas (es decir, tablas de frecuencia y tablas de referencias cruzadas).
- Otras funciones de janitor dan un formato agradable a los resultados de estas tabulaciones. Juntas, estas funciones de tabulación e informe se aproximan a las características populares de SPSS y Microsoft Excel.
- Data: MyMSA

#install.packages("janitor")
#install.packages("readxl")
library(janitor)
library(readxl)

mymsa = read_excel("data/mymsa.xlsx")

x = janitor::clean_names(mymsa)

data.frame(mymsa = colnames(mymsa), x = colnames(x))

##                mymsa                    x
## 1               RFID                 rfid
## 2              Plant                plant
## 3           KillDate            kill_date
## 4             BodyNo              body_no
## 5   LeftSideScanTime  left_side_scan_time
## 6  RightSideScanTime right_side_scan_time
## 7         HangMethod          hang_method
## 8                Hgp                  hgp
## 9                Sex                  sex
## 10          LeftHscw            left_hscw
## 11         RightHscw           right_hscw
## 12         TotalHscw           total_hscw
## 13             P8Fat                p8fat
## 14               Lot                  lot
## 15          Est % BI       est_percent_bi
## 16          HumpCold            hump_cold
## 17               Ema                  ema
## 18  OssificationCold    ossification_cold
## 19       AusMarbling         aus_marbling
## 20       MsaMarbling         msa_marbling
## 21        MeatColour          meat_colour
## 22         FatColour           fat_colour
## 23        RibfatCold          ribfat_cold
## 24                Ph                   ph
## 25          LoinTemp            loin_temp
## 26          FeedType            feed_type
## 27      NoDaysOnFeed      no_days_on_feed
## 28          MSAIndex            msa_index
## 29             spare                spare

x %>% tabyl(meat_colour) # devuelve una tabla de frecuencias

##  meat_colour    n percent
##           1B   87 0.02175
##           1C  657 0.16425
##            2 1730 0.43250
##            3 1478 0.36950
##            4   30 0.00750
##            5   14 0.00350
##            6    4 0.00100

x %>% 
  tabyl(meat_colour) %>% 
  adorn_pct_formatting(digits = 0, affix_sign = TRUE) # para incluir porcentaje

##  meat_colour    n percent
##           1B   87      2%
##           1C  657     16%
##            2 1730     43%
##            3 1478     37%
##            4   30      1%
##            5   14      0%
##            6    4      0%

x %>% tabyl(spare)

##  spare    n percent valid_percent
##     NA 4000       1            NA

x = remove_empty(x, which = c("rows","cols")) # elimina las columnas que están completamente vacías y las filas enteras que están completamente vacías.

x = read_excel("data/mymsa.xlsx") %>% 
  clean_names() %>% remove_empty() # podemos incluirlo desde la lectura de la base

x %>% tabyl(meat_colour, plant) #tabulacion cruzada

##  meat_colour    1   2
##           1B    0  87
##           1C   87 570
##            2 1443 287
##            3 1477   1
##            4   27   3
##            5    9   5
##            6    1   3

# fila de totales
x %>% 
  tabyl(meat_colour, plant) %>% 
  adorn_totals(where = "row")

##  meat_colour    1   2
##           1B    0  87
##           1C   87 570
##            2 1443 287
##            3 1477   1
##            4   27   3
##            5    9   5
##            6    1   3
##        Total 3044 956

# columna de totales
x %>% 
  tabyl(meat_colour, plant) %>% 
  adorn_totals(where = "col")

##  meat_colour    1   2 Total
##           1B    0  87    87
##           1C   87 570   657
##            2 1443 287  1730
##            3 1477   1  1478
##            4   27   3    30
##            5    9   5    14
##            6    1   3     4

# filas y columnas de totales
x %>% 
  tabyl(meat_colour, plant) %>% 
  adorn_totals(where = c("row","col"))

##  meat_colour    1   2 Total
##           1B    0  87    87
##           1C   87 570   657
##            2 1443 287  1730
##            3 1477   1  1478
##            4   27   3    30
##            5    9   5    14
##            6    1   3     4
##        Total 3044 956  4000

x %>% 
  tabyl(meat_colour, plant) %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "col") %>% 
  adorn_pct_formatting(digits = 0)  # con porcentajes

##  meat_colour    1    2 Total
##           1B   0%   9%    2%
##           1C   3%  60%   16%
##            2  47%  30%   43%
##            3  49%   0%   37%
##            4   1%   0%    1%
##            5   0%   1%    0%
##            6   0%   0%    0%
##        Total 100% 100%  100%

# conteos

x %>% 
  tabyl(meat_colour, plant) %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "col") %>% 
  adorn_pct_formatting(digits = 0) %>% 
  adorn_ns(position = "front")

##  meat_colour           1          2       Total
##           1B    0   (0%)  87   (9%)   87   (2%)
##           1C   87   (3%) 570  (60%)  657  (16%)
##            2 1443  (47%) 287  (30%) 1730  (43%)
##            3 1477  (49%)   1   (0%) 1478  (37%)
##            4   27   (1%)   3   (0%)   30   (1%)
##            5    9   (0%)   5   (1%)   14   (0%)
##            6    1   (0%)   3   (0%)    4   (0%)
##        Total 3044 (100%) 956 (100%) 4000 (100%)

# examinamos si hay duplicados 

x %>% get_dupes(rfid)

## # A tibble: 0 x 29
## # ... with 29 variables: rfid <chr>, dupe_count <int>, plant <dbl>,
## #   kill_date <dttm>, body_no <dbl>, left_side_scan_time <dbl>,
## #   right_side_scan_time <dbl>, hang_method <chr>, hgp <chr>, sex <chr>,
## #   left_hscw <dbl>, right_hscw <dbl>, total_hscw <dbl>, p8fat <dbl>,
## #   lot <dbl>, est_percent_bi <chr>, hump_cold <dbl>, ema <dbl>,
## #   ossification_cold <dbl>, aus_marbling <dbl>, msa_marbling <dbl>,
## #   meat_colour <chr>, fat_colour <dbl>, ribfat_cold <dbl>, ph <dbl>,
## #   loin_temp <dbl>, feed_type <chr>, no_days_on_feed <dbl>, msa_index <dbl>

# vamos a crear duplicados artificiales 

x1 = x %>% slice(1:3)
x2 = bind_rows(x1,x)
x2 %>% get_dupes(rfid)

## # A tibble: 6 x 29
##   rfid           dupe_count plant kill_date           body_no left_side_scan_ti~
##   <chr>               <int> <dbl> <dttm>                <dbl>              <dbl>
## 1 201 553126081~          2     1 2016-08-15 00:00:00     193                423
## 2 201 553126081~          2     1 2016-08-15 00:00:00     193                423
## 3 253 120151214~          2     1 2016-08-15 00:00:00     257                542
## 4 253 120151214~          2     1 2016-08-15 00:00:00     257                542
## 5 818 415178538~          2     1 2016-08-02 00:00:00      99                445
## 6 818 415178538~          2     1 2016-08-02 00:00:00      99                445
## # ... with 23 more variables: right_side_scan_time <dbl>, hang_method <chr>,
## #   hgp <chr>, sex <chr>, left_hscw <dbl>, right_hscw <dbl>, total_hscw <dbl>,
## #   p8fat <dbl>, lot <dbl>, est_percent_bi <chr>, hump_cold <dbl>, ema <dbl>,
## #   ossification_cold <dbl>, aus_marbling <dbl>, msa_marbling <dbl>,
## #   meat_colour <chr>, fat_colour <dbl>, ribfat_cold <dbl>, ph <dbl>,
## #   loin_temp <dbl>, feed_type <chr>, no_days_on_feed <dbl>, msa_index <dbl>

#¿Alguna vez leyo datos de Excel y vio un valor como 42223 donde debería estar una fecha? Esta función convierte esos números de serie a la clase Fecha.

excel_numeric_to_date(41103)

## [1] "2012-07-13"

6. Extracción de información

6.1 Banco Central del Ecuador

librería rio: Importación y exportación de datos optimizada, la importación basada en web es compatible de forma nativa (incluso desde SSL / HTTPS), los archivos comprimidos se pueden leer directamente sin descompresión explícita y se utilizan paquetes de importación rápida cuando sea apropiado.
librería rio: Importación y exportación de datos optimizada, la importación basada en web es compatible de forma nativa (incluso desde SSL / HTTPS), los archivos comprimidos se pueden leer directamente sin descompresión explícita y se utilizan paquetes de importación rápida cuando sea apropiado.

#install.packages("rio")
library(rio)
library(janitor)
library(tidyverse)

# Data del Banco Central del Ecuador ====

urlData <- "https://contenido.bce.fin.ec/documentos/Estadisticas/SectorReal/CuentasProvinciales/Can2019.xlsx" 

data <- import(urlData, sheet = "VAB CANTONAL", skip = 6, col_names = TRUE) # importamos la base

data = data %>% 
  pivot_longer(-c(PROVINCIA,"CÓDIGO PROVINCIA","CANTÓN","CÓDIGO CANTÓN"),
               names_to="Sector",values_to="VAB") %>% 
  mutate(Sector = str_to_sentence(str_squish(str_replace_all(Sector, "\r|\n", "")))) %>% 
  mutate(PROVINCIA = str_to_sentence(PROVINCIA))


data <- clean_names(dat = data,case = "upper_camel")

data = data %>% filter(!is.na(CodigoCanton))

data %>% glimpse()

## Rows: 3,315
## Columns: 6
## $ Provincia       <chr> "Azuay", "Azuay", "Azuay", "Azuay", "Azuay", "Azuay", ~
## $ CodigoProvincia <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", ~
## $ Canton          <chr> "Cuenca", "Cuenca", "Cuenca", "Cuenca", "Cuenca", "Cue~
## $ CodigoCanton    <chr> "0101", "0101", "0101", "0101", "0101", "0101", "0101"~
## $ Sector          <chr> "Agricultura, ganadería, silvicultura y pesca", "Explo~
## $ Vab             <dbl> 92901.6333, 69016.8075, 902215.8441, 77681.5980, 83595~

6.1 Banco Mundial

El Banco Mundial pone a disposición una gran cantidad de datos de los Indicadores de desarrollo mundial a través de su API web. El paquete WDI para R facilita la búsqueda y descarga de series de datos desde WDI.

#install.packages('WDI')
library(WDI)
library(ggplot2)

WDIsearch('gdp')

##        indicator                   
##   [1,] "5.51.01.10.gdp"            
##   [2,] "6.0.GDP_current"           
##   [3,] "6.0.GDP_growth"            
##   [4,] "6.0.GDP_usd"               
##   [5,] "6.0.GDPpc_constant"        
##   [6,] "BG.GSR.NFSV.GD.ZS"         
##   [7,] "BG.KAC.FNEI.GD.PP.ZS"      
##   [8,] "BG.KAC.FNEI.GD.ZS"         
##   [9,] "BG.KLT.DINV.GD.PP.ZS"      
##  [10,] "BG.KLT.DINV.GD.ZS"         
##  [11,] "BI.WAG.TOTL.GD.ZS"         
##  [12,] "BM.GSR.MRCH.ZS"            
##  [13,] "BM.KLT.DINV.GD.ZS"         
##  [14,] "BM.KLT.DINV.WD.GD.ZS"      
##  [15,] "BN.CAB.XOKA.GD.ZS"         
##  [16,] "BN.CAB.XOKA.GDP.ZS"        
##  [17,] "BN.CAB.XOTR.ZS"            
##  [18,] "BN.CUR.GDPM.ZS"            
##  [19,] "BN.GSR.FCTY.CD.ZS"         
##  [20,] "BN.KLT.DINV.CD.ZS"         
##  [21,] "BN.KLT.DINV.DRS.GDP.ZS"    
##  [22,] "BN.KLT.PRVT.GD.ZS"         
##  [23,] "BN.TRF.CURR.CD.ZS"         
##  [24,] "BX.GSR.MRCH.ZS"            
##  [25,] "BX.KLT.DINV.DT.GD.ZS"      
##  [26,] "BX.KLT.DINV.WD.GD.ZS"      
##  [27,] "BX.TRF.MGR.DT.GD.ZS"       
##  [28,] "BX.TRF.PWKR.DT.GD.ZS"      
##  [29,] "BX.TRF.PWKR.GD.ZS"         
##  [30,] "CM.FIN.INTL.GD.ZS"         
##  [31,] "CM.MKT.LCAP.GD.ZS"         
##  [32,] "CM.MKT.TRAD.GD.ZS"         
##  [33,] "DP.DOD.DECD.CR.BC.Z1"      
##  [34,] "DP.DOD.DECD.CR.CG.Z1"      
##  [35,] "DP.DOD.DECD.CR.FC.Z1"      
##  [36,] "DP.DOD.DECD.CR.GG.Z1"      
##  [37,] "DP.DOD.DECD.CR.NF.Z1"      
##  [38,] "DP.DOD.DECF.CR.BC.Z1"      
##  [39,] "DP.DOD.DECF.CR.CG.Z1"      
##  [40,] "DP.DOD.DECF.CR.FC.Z1"      
##  [41,] "DP.DOD.DECF.CR.GG.Z1"      
##  [42,] "DP.DOD.DECF.CR.NF.Z1"      
##  [43,] "DP.DOD.DECN.CR.BC.Z1"      
##  [44,] "DP.DOD.DECN.CR.CG.Z1"      
##  [45,] "DP.DOD.DECN.CR.FC.Z1"      
##  [46,] "DP.DOD.DECN.CR.GG.Z1"      
##  [47,] "DP.DOD.DECN.CR.NF.Z1"      
##  [48,] "DP.DOD.DECT.CR.BC.Z1"      
##  [49,] "DP.DOD.DECT.CR.CG.Z1"      
##  [50,] "DP.DOD.DECT.CR.FC.Z1"      
##  [51,] "DP.DOD.DECT.CR.GG.Z1"      
##  [52,] "DP.DOD.DECT.CR.NF.Z1"      
##  [53,] "DP.DOD.DECX.CR.BC.Z1"      
##  [54,] "DP.DOD.DECX.CR.CG.Z1"      
##  [55,] "DP.DOD.DECX.CR.FC.Z1"      
##  [56,] "DP.DOD.DECX.CR.GG.Z1"      
##  [57,] "DP.DOD.DECX.CR.NF.Z1"      
##  [58,] "DP.DOD.DLCD.CR.BC.Z1"      
##  [59,] "DP.DOD.DLCD.CR.CG.Z1"      
##  [60,] "DP.DOD.DLCD.CR.FC.Z1"      
##  [61,] "DP.DOD.DLCD.CR.GG.Z1"      
##  [62,] "DP.DOD.DLCD.CR.L1.BC.Z1"   
##  [63,] "DP.DOD.DLCD.CR.L1.CG.Z1"   
##  [64,] "DP.DOD.DLCD.CR.L1.FC.Z1"   
##  [65,] "DP.DOD.DLCD.CR.L1.GG.Z1"   
##  [66,] "DP.DOD.DLCD.CR.L1.NF.Z1"   
##  [67,] "DP.DOD.DLCD.CR.M1.BC.Z1"   
##  [68,] "DP.DOD.DLCD.CR.M1.CG.Z1"   
##  [69,] "DP.DOD.DLCD.CR.M1.FC.Z1"   
##  [70,] "DP.DOD.DLCD.CR.M1.GG.Z1"   
##  [71,] "DP.DOD.DLCD.CR.M1.NF.Z1"   
##  [72,] "DP.DOD.DLCD.CR.NF.Z1"      
##  [73,] "DP.DOD.DLD1.CR.CG.Z1"      
##  [74,] "DP.DOD.DLD1.CR.GG.Z1"      
##  [75,] "DP.DOD.DLD2.CR.CG.Z1"      
##  [76,] "DP.DOD.DLD2.CR.GG.Z1"      
##  [77,] "DP.DOD.DLD2A.CR.CG.Z1"     
##  [78,] "DP.DOD.DLD2A.CR.GG.Z1"     
##  [79,] "DP.DOD.DLD3.CR.CG.Z1"      
##  [80,] "DP.DOD.DLD3.CR.GG.Z1"      
##  [81,] "DP.DOD.DLD4.CR.CG.Z1"      
##  [82,] "DP.DOD.DLD4.CR.GG.Z1"      
##  [83,] "DP.DOD.DLDS.CR.BC.Z1"      
##  [84,] "DP.DOD.DLDS.CR.CG.Z1"      
##  [85,] "DP.DOD.DLDS.CR.FC.Z1"      
##  [86,] "DP.DOD.DLDS.CR.GG.Z1"      
##  [87,] "DP.DOD.DLDS.CR.L1.BC.Z1"   
##  [88,] "DP.DOD.DLDS.CR.L1.CG.Z1"   
##  [89,] "DP.DOD.DLDS.CR.L1.FC.Z1"   
##  [90,] "DP.DOD.DLDS.CR.L1.GG.Z1"   
##  [91,] "DP.DOD.DLDS.CR.L1.NF.Z1"   
##  [92,] "DP.DOD.DLDS.CR.M1.BC.Z1"   
##  [93,] "DP.DOD.DLDS.CR.M1.CG.Z1"   
##  [94,] "DP.DOD.DLDS.CR.M1.FC.Z1"   
##  [95,] "DP.DOD.DLDS.CR.M1.GG.Z1"   
##  [96,] "DP.DOD.DLDS.CR.M1.NF.Z1"   
##  [97,] "DP.DOD.DLDS.CR.MV.BC.Z1"   
##  [98,] "DP.DOD.DLDS.CR.MV.CG.Z1"   
##  [99,] "DP.DOD.DLDS.CR.MV.FC.Z1"   
## [100,] "DP.DOD.DLDS.CR.MV.GG.Z1"   
## [101,] "DP.DOD.DLDS.CR.MV.NF.Z1"   
## [102,] "DP.DOD.DLDS.CR.NF.Z1"      
## [103,] "DP.DOD.DLIN.CR.BC.Z1"      
## [104,] "DP.DOD.DLIN.CR.CG.Z1"      
## [105,] "DP.DOD.DLIN.CR.FC.Z1"      
## [106,] "DP.DOD.DLIN.CR.GG.Z1"      
## [107,] "DP.DOD.DLIN.CR.L1.BC.Z1"   
## [108,] "DP.DOD.DLIN.CR.L1.CG.Z1"   
## [109,] "DP.DOD.DLIN.CR.L1.FC.Z1"   
## [110,] "DP.DOD.DLIN.CR.L1.GG.Z1"   
## [111,] "DP.DOD.DLIN.CR.L1.NF.Z1"   
## [112,] "DP.DOD.DLIN.CR.M1.BC.Z1"   
## [113,] "DP.DOD.DLIN.CR.M1.CG.Z1"   
## [114,] "DP.DOD.DLIN.CR.M1.FC.Z1"   
## [115,] "DP.DOD.DLIN.CR.M1.GG.Z1"   
## [116,] "DP.DOD.DLIN.CR.M1.NF.Z1"   
## [117,] "DP.DOD.DLIN.CR.NF.Z1"      
## [118,] "DP.DOD.DLLO.CR.BC.Z1"      
## [119,] "DP.DOD.DLLO.CR.CG.Z1"      
## [120,] "DP.DOD.DLLO.CR.FC.Z1"      
## [121,] "DP.DOD.DLLO.CR.GG.Z1"      
## [122,] "DP.DOD.DLLO.CR.L1.BC.Z1"   
## [123,] "DP.DOD.DLLO.CR.L1.CG.Z1"   
## [124,] "DP.DOD.DLLO.CR.L1.FC.Z1"   
## [125,] "DP.DOD.DLLO.CR.L1.GG.Z1"   
## [126,] "DP.DOD.DLLO.CR.L1.NF.Z1"   
## [127,] "DP.DOD.DLLO.CR.M1.BC.Z1"   
## [128,] "DP.DOD.DLLO.CR.M1.CG.Z1"   
## [129,] "DP.DOD.DLLO.CR.M1.FC.Z1"   
## [130,] "DP.DOD.DLLO.CR.M1.GG.Z1"   
## [131,] "DP.DOD.DLLO.CR.M1.NF.Z1"   
## [132,] "DP.DOD.DLLO.CR.NF.Z1"      
## [133,] "DP.DOD.DLOA.CR.BC.Z1"      
## [134,] "DP.DOD.DLOA.CR.CG.Z1"      
## [135,] "DP.DOD.DLOA.CR.FC.Z1"      
## [136,] "DP.DOD.DLOA.CR.GG.Z1"      
## [137,] "DP.DOD.DLOA.CR.L1.BC.Z1"   
## [138,] "DP.DOD.DLOA.CR.L1.CG.Z1"   
## [139,] "DP.DOD.DLOA.CR.L1.FC.Z1"   
## [140,] "DP.DOD.DLOA.CR.L1.GG.Z1"   
## [141,] "DP.DOD.DLOA.CR.L1.NF.Z1"   
## [142,] "DP.DOD.DLOA.CR.M1.BC.Z1"   
## [143,] "DP.DOD.DLOA.CR.M1.CG.Z1"   
## [144,] "DP.DOD.DLOA.CR.M1.FC.Z1"   
## [145,] "DP.DOD.DLOA.CR.M1.GG.Z1"   
## [146,] "DP.DOD.DLOA.CR.M1.NF.Z1"   
## [147,] "DP.DOD.DLOA.CR.NF.Z1"      
## [148,] "DP.DOD.DLSD.CR.BC.Z1"      
## [149,] "DP.DOD.DLSD.CR.CG.Z1"      
## [150,] "DP.DOD.DLSD.CR.FC.Z1"      
## [151,] "DP.DOD.DLSD.CR.GG.Z1"      
## [152,] "DP.DOD.DLSD.CR.M1.BC.Z1"   
## [153,] "DP.DOD.DLSD.CR.M1.CG.Z1"   
## [154,] "DP.DOD.DLSD.CR.M1.FC.Z1"   
## [155,] "DP.DOD.DLSD.CR.M1.GG.Z1"   
## [156,] "DP.DOD.DLSD.CR.M1.NF.Z1"   
## [157,] "DP.DOD.DLSD.CR.NF.Z1"      
## [158,] "DP.DOD.DLTC.CR.BC.Z1"      
## [159,] "DP.DOD.DLTC.CR.CG.Z1"      
## [160,] "DP.DOD.DLTC.CR.FC.Z1"      
## [161,] "DP.DOD.DLTC.CR.GG.Z1"      
## [162,] "DP.DOD.DLTC.CR.L1.BC.Z1"   
## [163,] "DP.DOD.DLTC.CR.L1.CG.Z1"   
## [164,] "DP.DOD.DLTC.CR.L1.FC.Z1"   
## [165,] "DP.DOD.DLTC.CR.L1.GG.Z1"   
## [166,] "DP.DOD.DLTC.CR.L1.NF.Z1"   
## [167,] "DP.DOD.DLTC.CR.M1.BC.Z1"   
## [168,] "DP.DOD.DLTC.CR.M1.CG.Z1"   
## [169,] "DP.DOD.DLTC.CR.M1.FC.Z1"   
## [170,] "DP.DOD.DLTC.CR.M1.GG.Z1"   
## [171,] "DP.DOD.DLTC.CR.M1.NF.Z1"   
## [172,] "DP.DOD.DLTC.CR.NF.Z1"      
## [173,] "DP.DOD.DSCD.CR.BC.Z1"      
## [174,] "DP.DOD.DSCD.CR.CG.Z1"      
## [175,] "DP.DOD.DSCD.CR.FC.Z1"      
## [176,] "DP.DOD.DSCD.CR.GG.Z1"      
## [177,] "DP.DOD.DSCD.CR.NF.Z1"      
## [178,] "DP.DOD.DSDS.CR.BC.Z1"      
## [179,] "DP.DOD.DSDS.CR.CG.Z1"      
## [180,] "DP.DOD.DSDS.CR.FC.Z1"      
## [181,] "DP.DOD.DSDS.CR.GG.Z1"      
## [182,] "DP.DOD.DSDS.CR.NF.Z1"      
## [183,] "DP.DOD.DSIN.CR.BC.Z1"      
## [184,] "DP.DOD.DSIN.CR.CG.Z1"      
## [185,] "DP.DOD.DSIN.CR.FC.Z1"      
## [186,] "DP.DOD.DSIN.CR.GG.Z1"      
## [187,] "DP.DOD.DSIN.CR.NF.Z1"      
## [188,] "DP.DOD.DSLO.CR.BC.Z1"      
## [189,] "DP.DOD.DSLO.CR.CG.Z1"      
## [190,] "DP.DOD.DSLO.CR.FC.Z1"      
## [191,] "DP.DOD.DSLO.CR.GG.Z1"      
## [192,] "DP.DOD.DSLO.CR.NF.Z1"      
## [193,] "DP.DOD.DSOA.CR.BC.Z1"      
## [194,] "DP.DOD.DSOA.CR.CG.Z1"      
## [195,] "DP.DOD.DSOA.CR.FC.Z1"      
## [196,] "DP.DOD.DSOA.CR.GG.Z1"      
## [197,] "DP.DOD.DSOA.CR.NF.Z1"      
## [198,] "DP.DOD.DSTC.CR.BC.Z1"      
## [199,] "DP.DOD.DSTC.CR.CG.Z1"      
## [200,] "DP.DOD.DSTC.CR.FC.Z1"      
## [201,] "DP.DOD.DSTC.CR.GG.Z1"      
## [202,] "DP.DOD.DSTC.CR.NF.Z1"      
## [203,] "DT.DOD.ALLC.ZSG"           
## [204,] "DT.DOD.ALLN.ZSG"           
## [205,] "DT.DOD.DECT.CD.ZSG"        
## [206,] "DT.ODA.ALLD.GD.ZS"         
## [207,] "DT.ODA.DACD.ZSG"           
## [208,] "DT.ODA.MULT.ZSG"           
## [209,] "DT.ODA.NDAC.ZSG"           
## [210,] "DT.ODA.ODAT.GD.ZS"         
## [211,] "DT.TDS.DECT.GD.ZS"         
## [212,] "EG.EGY.PRIM.PP.KD"         
## [213,] "EG.GDP.PUSE.KO.87"         
## [214,] "EG.GDP.PUSE.KO.KD"         
## [215,] "EG.GDP.PUSE.KO.PP"         
## [216,] "EG.GDP.PUSE.KO.PP.KD"      
## [217,] "EG.USE.COMM.GD.PP.KD"      
## [218,] "EN.ATM.CO2E.GDP"           
## [219,] "EN.ATM.CO2E.KD.87.GD"      
## [220,] "EN.ATM.CO2E.KD.GD"         
## [221,] "EN.ATM.CO2E.PP.GD"         
## [222,] "EN.ATM.CO2E.PP.GD.KD"      
## [223,] "ER.GDP.FWTL.M3.KD"         
## [224,] "EU.EGY.USES.GDP"           
## [225,] "FB.DPT.INSU.PC.ZS"         
## [226,] "FD.AST.PRVT.GD.ZS"         
## [227,] "FI.RES.TOTL.CD.ZS"         
## [228,] "FM.AST.GOVT.CN.ZS"         
## [229,] "FM.LBL.BMNY.GD.ZS"         
## [230,] "FM.LBL.MQMY.GD.ZS"         
## [231,] "FM.LBL.MQMY.GDP.ZS"        
## [232,] "FM.LBL.MQMY.XD"            
## [233,] "FM.LBL.QMNY.GDP.ZS"        
## [234,] "FM.LBL.SEIG.GDP.ZS"        
## [235,] "FS.AST.CGOV.GD.ZS"         
## [236,] "FS.AST.DOMO.GD.ZS"         
## [237,] "FS.AST.DOMS.GD.ZS"         
## [238,] "FS.AST.DTOT.ZS"            
## [239,] "FS.AST.PRVT.GD.ZS"         
## [240,] "FS.AST.PRVT.GDP.ZS"        
## [241,] "FS.LBL.LIQU.GD.ZS"         
## [242,] "FS.LBL.LIQU.GDP.ZS"        
## [243,] "FS.LBL.QLIQ.GD.ZS"         
## [244,] "GB.BAL.OVRL.GD.ZS"         
## [245,] "GB.BAL.OVRL.GDP.ZS"        
## [246,] "GB.DOD.TOTL.GD.ZS"         
## [247,] "GB.DOD.TOTL.GDP.ZS"        
## [248,] "GB.FIN.ABRD.GD.ZS"         
## [249,] "GB.FIN.ABRD.GDP.ZS"        
## [250,] "GB.FIN.DOMS.GD.ZS"         
## [251,] "GB.FIN.DOMS.GDP.ZS"        
## [252,] "GB.REV.CTOT.GD.ZS"         
## [253,] "GB.REV.TOTL.GDP.ZS"        
## [254,] "GB.REV.XAGT.CN.ZS"         
## [255,] "GB.RVC.TOTL.GD.ZS"         
## [256,] "GB.SOE.DECT.ZS"            
## [257,] "GB.SOE.ECON.GD.ZS"         
## [258,] "GB.SOE.ECON.GDP.ZS"        
## [259,] "GB.SOE.NFLW.GD.ZS"         
## [260,] "GB.SOE.NFLW.GDP.ZS"        
## [261,] "GB.SOE.OVRL.GD.ZS"         
## [262,] "GB.TAX.TOTL.GD.ZS"         
## [263,] "GB.TAX.TOTL.GDP.ZS"        
## [264,] "GB.XPD.DEFN.GDP.ZS"        
## [265,] "GB.XPD.RSDV.GD.ZS"         
## [266,] "GB.XPD.TOTL.GD.ZS"         
## [267,] "GB.XPD.TOTL.GDP.ZS"        
## [268,] "GC.AST.TOTL.GD.ZS"         
## [269,] "GC.BAL.CASH.GD.ZS"         
## [270,] "GC.DOD.TOTL.GD.ZS"         
## [271,] "GC.FIN.DOMS.GD.ZS"         
## [272,] "GC.FIN.FRGN.GD.ZS"         
## [273,] "GC.LBL.TOTL.GD.ZS"         
## [274,] "GC.NFN.TOTL.GD.ZS"         
## [275,] "GC.NLD.TOTL.GD.ZS"         
## [276,] "GC.REV.XGRT.GD.ZS"         
## [277,] "GC.TAX.TOTL.GD.ZS"         
## [278,] "GC.XPN.TOTL.GD.ZS"         
## [279,] "GD.ZS"                     
## [280,] "GFDD.DI.01"                
## [281,] "GFDD.DI.02"                
## [282,] "GFDD.DI.03"                
## [283,] "GFDD.DI.05"                
## [284,] "GFDD.DI.06"                
## [285,] "GFDD.DI.07"                
## [286,] "GFDD.DI.08"                
## [287,] "GFDD.DI.09"                
## [288,] "GFDD.DI.10"                
## [289,] "GFDD.DI.11"                
## [290,] "GFDD.DI.12"                
## [291,] "GFDD.DI.13"                
## [292,] "GFDD.DI.14"                
## [293,] "GFDD.DM.01"                
## [294,] "GFDD.DM.02"                
## [295,] "GFDD.DM.03"                
## [296,] "GFDD.DM.04"                
## [297,] "GFDD.DM.05"                
## [298,] "GFDD.DM.06"                
## [299,] "GFDD.DM.07"                
## [300,] "GFDD.DM.08"                
## [301,] "GFDD.DM.09"                
## [302,] "GFDD.DM.10"                
## [303,] "GFDD.DM.11"                
## [304,] "GFDD.DM.12"                
## [305,] "GFDD.DM.13"                
## [306,] "GFDD.EI.08"                
## [307,] "GFDD.OI.02"                
## [308,] "GFDD.OI.08"                
## [309,] "GFDD.OI.09"                
## [310,] "GFDD.OI.13"                
## [311,] "GFDD.OI.14"                
## [312,] "GFDD.OI.17"                
## [313,] "GFDD.OI.18"                
## [314,] "IE.ICT.TOTL.GD.ZS"         
## [315,] "IS.RRS.GOOD.KM.PP.ZS"      
## [316,] "IS.RRS.PASG.K2.PP.ZS"      
## [317,] "IT.TEL.REVN.GD.ZS"         
## [318,] "MS.MIL.XPND.GD.ZS"         
## [319,] "NA.GDP.ACC.FB.SNA08.CR"    
## [320,] "NA.GDP.ACC.FB.SNA08.KR"    
## [321,] "NA.GDP.AGR.CR"             
## [322,] "NA.GDP.AGR.KR"             
## [323,] "NA.GDP.AGR.SNA08.CR"       
## [324,] "NA.GDP.AGR.SNA08.KR"       
## [325,] "NA.GDP.BUSS.SNA08.CR"      
## [326,] "NA.GDP.BUSS.SNA08.KR"      
## [327,] "NA.GDP.CNST.CR"            
## [328,] "NA.GDP.CNST.KR"            
## [329,] "NA.GDP.CNST.SNA08.CR"      
## [330,] "NA.GDP.CNST.SNA08.KR"      
## [331,] "NA.GDP.EDUS.SNA08.CR"      
## [332,] "NA.GDP.EDUS.SNA08.KR"      
## [333,] "NA.GDP.ELEC.GAS.SNA08.CR"  
## [334,] "NA.GDP.ELEC.GAS.SNA08.KR"  
## [335,] "NA.GDP.EXC.OG.CR"          
## [336,] "NA.GDP.EXC.OG.KR"          
## [337,] "NA.GDP.FINS.CR"            
## [338,] "NA.GDP.FINS.KR"            
## [339,] "NA.GDP.FINS.SNA08.CR"      
## [340,] "NA.GDP.FINS.SNA08.KR"      
## [341,] "NA.GDP.HLTH.SOCW.SNA08.CR" 
## [342,] "NA.GDP.HLTH.SOCW.SNA08.KR" 
## [343,] "NA.GDP.INC.OG.CR"          
## [344,] "NA.GDP.INC.OG.KR"          
## [345,] "NA.GDP.INC.OG.SNA08.CR"    
## [346,] "NA.GDP.INC.OG.SNA08.KR"    
## [347,] "NA.GDP.INF.COMM.SNA08.CR"  
## [348,] "NA.GDP.INF.COMM.SNA08.KR"  
## [349,] "NA.GDP.MINQ.CR"            
## [350,] "NA.GDP.MINQ.KR"            
## [351,] "NA.GDP.MINQ.SNA08.CR"      
## [352,] "NA.GDP.MINQ.SNA08.KR"      
## [353,] "NA.GDP.MNF.CR"             
## [354,] "NA.GDP.MNF.KR"             
## [355,] "NA.GDP.MNF.SNA08.CR"       
## [356,] "NA.GDP.MNF.SNA08.KR"       
## [357,] "NA.GDP.PADM.DEF.SNA08.CR"  
## [358,] "NA.GDP.PADM.DEF.SNA08.KR"  
## [359,] "NA.GDP.REST.SNA08.CR"      
## [360,] "NA.GDP.REST.SNA08.KR"      
## [361,] "NA.GDP.SRV.OTHR.CR"        
## [362,] "NA.GDP.SRV.OTHR.KR"        
## [363,] "NA.GDP.SRV.OTHR.SNA08.CR"  
## [364,] "NA.GDP.SRV.OTHR.SNA08.KR"  
## [365,] "NA.GDP.TRAN.COMM.CR"       
## [366,] "NA.GDP.TRAN.COMM.KR"       
## [367,] "NA.GDP.TRAN.STOR.SNA08.CR" 
## [368,] "NA.GDP.TRAN.STOR.SNA08.KR" 
## [369,] "NA.GDP.TRD.HTL.CR"         
## [370,] "NA.GDP.TRD.HTL.KR"         
## [371,] "NA.GDP.TRD.SNA08.CR"       
## [372,] "NA.GDP.TRD.SNA08.KR"       
## [373,] "NA.GDP.UTL.CR"             
## [374,] "NA.GDP.UTL.KR"             
## [375,] "NA.GDP.WTR.WST.SNA08.CR"   
## [376,] "NA.GDP.WTR.WST.SNA08.KR"   
## [377,] "NE.CON.GOVT.ZS"            
## [378,] "NE.CON.PETC.ZS"            
## [379,] "NE.CON.PRVT.ZS"            
## [380,] "NE.CON.TETC.ZS"            
## [381,] "NE.CON.TOTL.ZG"            
## [382,] "NE.CON.TOTL.ZS"            
## [383,] "NE.DAB.TOTL.ZS"            
## [384,] "NE.EXP.GNFS.ZS"            
## [385,] "NE.GDI.CON.GOVT.CR"        
## [386,] "NE.GDI.CON.GOVT.SNA08.CR"  
## [387,] "NE.GDI.CON.NPI.CR"         
## [388,] "NE.GDI.CON.NPI.SNA08.CR"   
## [389,] "NE.GDI.CON.PRVT.CR"        
## [390,] "NE.GDI.CON.PRVT.SNA08.CR"  
## [391,] "NE.GDI.EXPT.CR"            
## [392,] "NE.GDI.EXPT.SNA08.CR"      
## [393,] "NE.GDI.FPRV.ZS"            
## [394,] "NE.GDI.FPUB.ZS"            
## [395,] "NE.GDI.FTOT.CR"            
## [396,] "NE.GDI.FTOT.SNA08.CR"      
## [397,] "NE.GDI.FTOT.ZS"            
## [398,] "NE.GDI.IMPT.CR"            
## [399,] "NE.GDI.IMPT.SNA08.CR"      
## [400,] "NE.GDI.INEX.SNA08.CR"      
## [401,] "NE.GDI.STKB.CR"            
## [402,] "NE.GDI.STKB.SNA08.CR"      
## [403,] "NE.GDI.TOTL.CR"            
## [404,] "NE.GDI.TOTL.SNA08.CR"      
## [405,] "NE.GDI.TOTL.ZG"            
## [406,] "NE.GDI.TOTL.ZS"            
## [407,] "NE.IMP.GNFS.ZS"            
## [408,] "NE.MRCH.GDP.ZS"            
## [409,] "NE.RSB.GNFS.ZG"            
## [410,] "NE.RSB.GNFS.ZS"            
## [411,] "NE.TRD.GNFS.ZS"            
## [412,] "NP.AGR.TOTL.ZG"            
## [413,] "NP.IND.TOTL.ZG"            
## [414,] "NP.SRV.TOTL.ZG"            
## [415,] "NV.AGR.PCAP.KD.ZG"         
## [416,] "NV.AGR.TOTL.ZG"            
## [417,] "NV.AGR.TOTL.ZS"            
## [418,] "NV.IND.MANF.ZS"            
## [419,] "NV.IND.TOTL.ZG"            
## [420,] "NV.IND.TOTL.ZS"            
## [421,] "NV.SRV.DISC.CD"            
## [422,] "NV.SRV.DISC.CN"            
## [423,] "NV.SRV.DISC.KN"            
## [424,] "NV.SRV.TETC.ZG"            
## [425,] "NV.SRV.TETC.ZS"            
## [426,] "NV.SRV.TOTL.ZS"            
## [427,] "NY.AGR.SUBS.GD.ZS"         
## [428,] "NY.GDP.COAL.RT.ZS"         
## [429,] "NY.GDP.DEFL.87.ZG"         
## [430,] "NY.GDP.DEFL.KD.ZG"         
## [431,] "NY.GDP.DEFL.KD.ZG.AD"      
## [432,] "NY.GDP.DEFL.ZS"            
## [433,] "NY.GDP.DEFL.ZS.87"         
## [434,] "NY.GDP.DEFL.ZS.AD"         
## [435,] "NY.GDP.DISC.CD"            
## [436,] "NY.GDP.DISC.CN"            
## [437,] "NY.GDP.DISC.KN"            
## [438,] "NY.GDP.FCST.KD.87"         
## [439,] "NY.GDP.FCST.KN.87"         
## [440,] "NY.GDP.FRST.RT.ZS"         
## [441,] "NY.GDP.MINR.RT.ZS"         
## [442,] "NY.GDP.MKTP.CD"            
## [443,] "NY.GDP.MKTP.CD.XD"         
## [444,] "NY.GDP.MKTP.CN"            
## [445,] "NY.GDP.MKTP.CN.AD"         
## [446,] "NY.GDP.MKTP.CN.XD"         
## [447,] "NY.GDP.MKTP.IN"            
## [448,] "NY.GDP.MKTP.KD"            
## [449,] "NY.GDP.MKTP.KD.87"         
## [450,] "NY.GDP.MKTP.KD.ZG"         
## [451,] "NY.GDP.MKTP.KN"            
## [452,] "NY.GDP.MKTP.KN.87"         
## [453,] "NY.GDP.MKTP.KN.87.ZG"      
## [454,] "NY.GDP.MKTP.PP.CD"         
## [455,] "NY.GDP.MKTP.PP.KD"         
## [456,] "NY.GDP.MKTP.PP.KD.87"      
## [457,] "NY.GDP.MKTP.XD"            
## [458,] "NY.GDP.MKTP.XU.E"          
## [459,] "NY.GDP.NGAS.RT.ZS"         
## [460,] "NY.GDP.PCAP.CD"            
## [461,] "NY.GDP.PCAP.CN"            
## [462,] "NY.GDP.PCAP.KD"            
## [463,] "NY.GDP.PCAP.KD.ZG"         
## [464,] "NY.GDP.PCAP.KN"            
## [465,] "NY.GDP.PCAP.PP.CD"         
## [466,] "NY.GDP.PCAP.PP.KD"         
## [467,] "NY.GDP.PCAP.PP.KD.87"      
## [468,] "NY.GDP.PCAP.PP.KD.ZG"      
## [469,] "NY.GDP.PETR.RT.ZS"         
## [470,] "NY.GDP.TOTL.RT.ZS"         
## [471,] "NY.GDS.TOTL.ZS"            
## [472,] "NY.GEN.AEDU.GD.ZS"         
## [473,] "NY.GEN.DCO2.GD.ZS"         
## [474,] "NY.GEN.DFOR.GD.ZS"         
## [475,] "NY.GEN.DKAP.GD.ZS"         
## [476,] "NY.GEN.DMIN.GD.ZS"         
## [477,] "NY.GEN.DNGY.GD.ZS"         
## [478,] "NY.GEN.NDOM.GD.ZS"         
## [479,] "NY.GEN.SVNG.GD.ZS"         
## [480,] "NY.GNS.ICTR.ZS"            
## [481,] "NYGDPMKTPKDZ"              
## [482,] "NYGDPMKTPSACD"             
## [483,] "NYGDPMKTPSACN"             
## [484,] "NYGDPMKTPSAKD"             
## [485,] "NYGDPMKTPSAKN"             
## [486,] "PA.NUS.PPP"                
## [487,] "PA.NUS.PPP.05"             
## [488,] "PA.NUS.PPPC.RF"            
## [489,] "S02"                       
## [490,] "SE.XPD.EDUC.ZS"            
## [491,] "SE.XPD.PRIM.GDP.ZS"        
## [492,] "SE.XPD.PRIM.PC.ZS"         
## [493,] "SE.XPD.SECO.GDP.ZS"        
## [494,] "SE.XPD.SECO.PC.ZS"         
## [495,] "SE.XPD.TERT.GDP.ZS"        
## [496,] "SE.XPD.TERT.PC.ZS"         
## [497,] "SE.XPD.TOTL.GD.ZS"         
## [498,] "SF.TRN.RAIL.KM.ZS"         
## [499,] "SH.XPD.CHEX.GD.ZS"         
## [500,] "SH.XPD.GHED.GD.ZS"         
## [501,] "SH.XPD.HLTH.ZS"            
## [502,] "SH.XPD.KHEX.GD.ZS"         
## [503,] "SH.XPD.PRIV.ZS"            
## [504,] "SH.XPD.PUBL.ZS"            
## [505,] "SH.XPD.TOTL.ZS"            
## [506,] "SL.GDP.PCAP.EM.KD"         
## [507,] "SL.GDP.PCAP.EM.KD.ZG"      
## [508,] "SL.GDP.PCAP.EM.XD"         
## [509,] "TG.VAL.TOTL.GD.PP.ZS"      
## [510,] "TG.VAL.TOTL.GD.ZS"         
## [511,] "TG.VAL.TOTL.GG.ZS"         
## [512,] "UIS.XGDP.0.FSGOV"          
## [513,] "UIS.XGDP.02.FSGOV.FFNTR"   
## [514,] "UIS.XGDP.1.FSGOV"          
## [515,] "UIS.XGDP.1.FSGOV.FFNTR"    
## [516,] "UIS.XGDP.1.FSHH.FFNTR"     
## [517,] "UIS.XGDP.2.FSGOV"          
## [518,] "UIS.XGDP.2.FSGOV.FFNTR"    
## [519,] "UIS.XGDP.23.FSGOV"         
## [520,] "UIS.XGDP.23.FSHH.FFNTR"    
## [521,] "UIS.XGDP.2T3.FSGOV.FFNTR"  
## [522,] "UIS.XGDP.2T4.V.FSGOV"      
## [523,] "UIS.XGDP.3.FSGOV"          
## [524,] "UIS.XGDP.3.FSGOV.FFNTR"    
## [525,] "UIS.XGDP.4.FSGOV"          
## [526,] "UIS.XGDP.56.FSGOV"         
## [527,] "UIS.XGDP.5T8.FSGOV.FFNTR"  
## [528,] "UIS.XGDP.5T8.FSHH.FFNTR"   
## [529,] "UIS.XGDP.FSGOV.FFNTR"      
## [530,] "UIS.XGDP.FSHH.FFNTR"       
## [531,] "UIS.XUNIT.GDPCAP.02.FSGOV" 
## [532,] "UIS.XUNIT.GDPCAP.1.FSGOV"  
## [533,] "UIS.XUNIT.GDPCAP.1.FSHH"   
## [534,] "UIS.XUNIT.GDPCAP.2.FSGOV"  
## [535,] "UIS.XUNIT.GDPCAP.23.FSGOV" 
## [536,] "UIS.XUNIT.GDPCAP.23.FSHH"  
## [537,] "UIS.XUNIT.GDPCAP.3.FSGOV"  
## [538,] "UIS.XUNIT.GDPCAP.5T8.FSGOV"
## [539,] "UIS.XUNIT.GDPCAP.5T8.FSHH" 
##        name                                                                                                                                                                       
##   [1,] "Per capita GDP growth"                                                                                                                                                    
##   [2,] "GDP (current $)"                                                                                                                                                          
##   [3,] "GDP growth (annual %)"                                                                                                                                                    
##   [4,] "GDP (constant 2005 $)"                                                                                                                                                    
##   [5,] "GDP per capita, PPP (constant 2011 international $) "                                                                                                                     
##   [6,] "Trade in services (% of GDP)"                                                                                                                                             
##   [7,] "Gross private capital flows (% of GDP, PPP)"                                                                                                                              
##   [8,] "Gross private capital flows (% of GDP)"                                                                                                                                   
##   [9,] "Gross foreign direct investment (% of GDP, PPP)"                                                                                                                          
##  [10,] "Gross foreign direct investment (% of GDP)"                                                                                                                               
##  [11,] "Wage bill as a percentage of GDP"                                                                                                                                         
##  [12,] "Merchandise imports (BOP): percentage of GDP (%)"                                                                                                                         
##  [13,] "Foreign direct investment, net outflows (% of GDP)"                                                                                                                       
##  [14,] "Foreign direct investment, net outflows (% of GDP)"                                                                                                                       
##  [15,] "Current account balance (% of GDP)"                                                                                                                                       
##  [16,] "Current account balance (% of GDP)"                                                                                                                                       
##  [17,] "Curr. acc. bal. before official transf. (% of GDP)"                                                                                                                       
##  [18,] "Current account balance excluding net official capital grants (% of GDP)"                                                                                                 
##  [19,] "Net income (% of GDP)"                                                                                                                                                    
##  [20,] "Foreign direct investment (% of GDP)"                                                                                                                                     
##  [21,] "Foreign direct investment, net inflows (% of GDP)"                                                                                                                        
##  [22,] "Private capital flows, total (% of GDP)"                                                                                                                                  
##  [23,] "Net current transfers (% of GDP)"                                                                                                                                         
##  [24,] "Merchandise exports (BOP): percentage of GDP (%)"                                                                                                                         
##  [25,] "Foreign direct investment, net inflows (% of GDP)"                                                                                                                        
##  [26,] "Foreign direct investment, net inflows (% of GDP)"                                                                                                                        
##  [27,] "Migrant remittance inflows (% of GDP)"                                                                                                                                    
##  [28,] "Personal remittances, received (% of GDP)"                                                                                                                                
##  [29,] "Workers' remittances, receipts (% of GDP)"                                                                                                                                
##  [30,] "Financing via international capital markets (gross inflows, % of GDP)"                                                                                                    
##  [31,] "Market capitalization of listed domestic companies (% of GDP)"                                                                                                            
##  [32,] "Stocks traded, total value (% of GDP)"                                                                                                                                    
##  [33,] "Gross PSD, Budgetary Central Gov., All maturities, All instruments, Domestic creditors, Nominal Value, % of GDP"                                                          
##  [34,] "Gross PSD, Central Gov., All maturities, All instruments, Domestic creditors, Nominal Value, % of GDP"                                                                    
##  [35,] "Gross PSD, Financial Public Corp., All maturities, All instruments, Domestic creditors, Nominal Value, % of GDP"                                                          
##  [36,] "Gross PSD, General Gov., All maturities, All instruments, Domestic creditors, Nominal Value, % of GDP"                                                                    
##  [37,] "Gross PSD, Nonfinancial Public Corp., All maturities, All instruments, Domestic creditors, Nominal Value, % of GDP"                                                       
##  [38,] "Gross PSD, Budgetary Central Gov., All maturities, All instruments, Foreign currency, Nominal Value, % of GDP"                                                            
##  [39,] "Gross PSD, Central Gov., All maturities, All instruments, Foreign currency, Nominal Value, % of GDP"                                                                      
##  [40,] "Gross PSD, Financial Public Corp., All maturities, All instruments, Foreign currency, Nominal Value, % of GDP"                                                            
##  [41,] "Gross PSD, General Gov., All maturities, All instruments, Foreign currency, Nominal Value, % of GDP"                                                                      
##  [42,] "Gross PSD, Nonfinancial Public Corp., All maturities, All instruments, Foreign currency, Nominal Value, % of GDP"                                                         
##  [43,] "Gross PSD, Budgetary Central Gov., All maturities, All instruments, Domestic currency, Nominal Value, % of GDP"                                                           
##  [44,] "Gross PSD, Central Gov., All maturities, All instruments, Domestic currency, Nominal Value, % of GDP"                                                                     
##  [45,] "Gross PSD, Financial Public Corp., All maturities, All instruments, Domestic currency, Nominal Value, % of GDP"                                                           
##  [46,] "Gross PSD, General Gov., All maturities, All instruments, Domestic currency, Nominal Value, % of GDP"                                                                     
##  [47,] "Gross PSD, Nonfinancial Public Corp., All maturities, All instruments, Domestic currency, Nominal Value, % of GDP"                                                        
##  [48,] "Gross PSD, Budgetary Central Gov., All maturities, All instruments, Nominal Value, % of GDP"                                                                              
##  [49,] "Gross PSD, Central Gov., All maturities, All instruments, Nominal Value, % of GDP"                                                                                        
##  [50,] "Gross PSD, Financial Public Corp., All maturities, All instruments, Nominal Value, % of GDP"                                                                              
##  [51,] "Gross PSD, General Gov., All maturities, All instruments, Nominal Value, % of GDP"                                                                                        
##  [52,] "Gross PSD, Nonfinancial Public Corp., All maturities, All instruments, Nominal Value, % of GDP"                                                                           
##  [53,] "Gross PSD, Budgetary Central Gov., All maturities, All instruments, External creditors, Nominal Value, % of GDP"                                                          
##  [54,] "Gross PSD, Central Gov., All maturities, All instruments, External creditors, Nominal Value, % of GDP"                                                                    
##  [55,] "Gross PSD, Financial Public Corp., All maturities, All instruments, External creditors, Nominal Value, % of GDP"                                                          
##  [56,] "Gross PSD, General Gov., All maturities, All instruments, External creditors, Nominal Value, % of GDP"                                                                    
##  [57,] "Gross PSD, Nonfinancial Public Corp., All maturities, All instruments, External creditors, Nominal Value, % of GDP"                                                       
##  [58,] "Gross PSD, Budgetary Central Gov., All maturities, Currency and deposits, Nominal Value, % of GDP"                                                                        
##  [59,] "Gross PSD, Central Gov., All maturities, Currency and deposits, Nominal Value, % of GDP"                                                                                  
##  [60,] "Gross PSD, Financial Public Corp., All maturities, Currency and deposits, Nominal Value, % of GDP"                                                                        
##  [61,] "Gross PSD, General Gov., All maturities, Currency and deposits, Nominal Value, % of GDP"                                                                                  
##  [62,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in one year or less, Currency and deposits, Nominal Value, % of GDP"                                       
##  [63,] "Gross PSD, Central Gov., Long-term, With payment due in one year or less, Currency and deposits, Nominal Value, % of GDP"                                                 
##  [64,] "Gross PSD, Financial Public Corp., Long-term, With payment due in one year or less, Currency and deposits, Nominal Value, % of GDP"                                       
##  [65,] "Gross PSD, General Gov., Long-term, With payment due in one year or less, Currency and deposits, Nominal Value, % of GDP"                                                 
##  [66,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in one year or less, Currency and deposits, Nominal Value, % of GDP"                                    
##  [67,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, Currency and deposits, Nominal Value, % of GDP"                                     
##  [68,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, Currency and deposits, Nominal Value, % of GDP"                                               
##  [69,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, Currency and deposits, Nominal Value, % of GDP"                                     
##  [70,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, Currency and deposits, Nominal Value, % of GDP"                                               
##  [71,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, Currency and deposits, Nominal Value, % of GDP"                                  
##  [72,] "Gross PSD, Nonfinancial Public Corp., All maturities, Currency and deposits, Nominal Value, % of GDP"                                                                     
##  [73,] "Gross PSD, Central Gov.-D1, All maturities, Debt securities + loans, Nominal Value, % of GDP"                                                                             
##  [74,] "Gross PSD, General Gov.-D1, All maturities, Debt securities + loans, Nominal Value, % of GDP"                                                                             
##  [75,] "Gross PSD, Central Gov.-D2, All maturities, D1+ SDRs + currency and deposits, Nominal Value, % of GDP"                                                                    
##  [76,] "Gross PSD, General Gov.-D2, All maturities, D1+ SDRs + currency and deposits, Nominal Value, % of GDP"                                                                    
##  [77,] "Gross PSD, Central Gov.-D2A, All maturities, D1+ currency and deposits, Maastricht debt, % of GDP"                                                                        
##  [78,] "Gross PSD, General Gov.-D2A, All maturities, D1+ currency and deposits, Maastricht debt, % of GDP"                                                                        
##  [79,] "Gross PSD, Central Gov.-D3, All maturities, D2+other accounts payable, Nominal Value, % of GDP"                                                                           
##  [80,] "Gross PSD, General Gov.-D3, All maturities, D2+other accounts payable, Nominal Value, % of GDP"                                                                           
##  [81,] "Gross PSD, Central Gov.-D4, All maturities, D3+insurance, pensions, and standardized guarantees, Nominal Value, % of GDP"                                                 
##  [82,] "Gross PSD, General Gov.-D4, All maturities, D3+insurance, pensions, and standardized guarantees, Nominal Value, % of GDP"                                                 
##  [83,] "Gross PSD, Budgetary Central Gov., All maturities, Debt securities, Nominal Value, % of GDP"                                                                              
##  [84,] "Gross PSD, Central Gov., All maturities, Debt securities, Nominal Value, % of GDP"                                                                                        
##  [85,] "Gross PSD, Financial Public Corp., All maturities, Debt securities, Nominal Value, % of GDP"                                                                              
##  [86,] "Gross PSD, General Gov., All maturities, Debt securities, Nominal Value, % of GDP"                                                                                        
##  [87,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in one year or less, Debt securities, Nominal Value, % of GDP"                                             
##  [88,] "Gross PSD, Central Gov., Long-term, With payment due in one year or less, Debt securities, Nominal Value, % of GDP"                                                       
##  [89,] "Gross PSD, Financial Public Corp., Long-term, With payment due in one year or less, Debt securities, Nominal Value, % of GDP"                                             
##  [90,] "Gross PSD, General Gov., Long-term, With payment due in one year or less, Debt securities, Nominal Value, % of GDP"                                                       
##  [91,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in one year or less, Debt securities, Nominal Value, % of GDP"                                          
##  [92,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, Debt securities, Nominal Value, % of GDP"                                           
##  [93,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, Debt securities, Nominal Value, % of GDP"                                                     
##  [94,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, Debt securities, Nominal Value, % of GDP"                                           
##  [95,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, Debt securities, Nominal Value, % of GDP"                                                     
##  [96,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, Debt securities, Nominal Value, % of GDP"                                        
##  [97,] "Gross PSD, Budgetary Central Gov., All maturities, Debt Securities, Market value, % of GDP"                                                                               
##  [98,] "Gross PSD, Central Gov., All maturities, Debt Securities, Market value, % of GDP"                                                                                         
##  [99,] "Gross PSD, Financial Public Corp., All maturities, Debt Securities, Market value, % of GDP"                                                                               
## [100,] "Gross PSD, General Gov., All maturities, Debt Securities, Market value, % of GDP"                                                                                         
## [101,] "Gross PSD, Nonfinancial Public Corp., All maturities, Debt Securities, Market value, % of GDP"                                                                            
## [102,] "Gross PSD, Nonfinancial Public Corp., All maturities, Debt securities, Nominal Value, % of GDP"                                                                           
## [103,] "Gross PSD, Budgetary Central Gov., All maturities, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                      
## [104,] "Gross PSD, Central Gov., All maturities, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                                
## [105,] "Gross PSD, Financial Public Corp., All maturities, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                      
## [106,] "Gross PSD, General Gov., All maturities, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                                
## [107,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in one year or less, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"     
## [108,] "Gross PSD, Central Gov., Long-term, With payment due in one year or less, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"               
## [109,] "Gross PSD, Financial Public Corp., Long-term, With payment due in one year or less, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"     
## [110,] "Gross PSD, General Gov., Long-term, With payment due in one year or less, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"               
## [111,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in one year or less, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"  
## [112,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"   
## [113,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"             
## [114,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"   
## [115,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"             
## [116,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"
## [117,] "Gross PSD, Nonfinancial Public Corp., All maturities, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                   
## [118,] "Gross PSD, Budgetary Central Gov., All maturities, Loans, Nominal Value, % of GDP"                                                                                        
## [119,] "Gross PSD, Central Gov., All maturities, Loans, Nominal Value, % of GDP"                                                                                                  
## [120,] "Gross PSD, Financial Public Corp., All maturities, Loans, Nominal Value, % of GDP"                                                                                        
## [121,] "Gross PSD, General Gov., All maturities, Loans, Nominal Value, % of GDP"                                                                                                  
## [122,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in one year or less, Loans, Nominal Value, % of GDP"                                                       
## [123,] "Gross PSD, Central Gov., Long-term, With payment due in one year or less, Loans, Nominal Value, % of GDP"                                                                 
## [124,] "Gross PSD, Financial Public Corp., Long-term, With payment due in one year or less, Loans, Nominal Value, % of GDP"                                                       
## [125,] "Gross PSD, General Gov., Long-term, With payment due in one year or less, Loans, Nominal Value, % of GDP"                                                                 
## [126,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in one year or less, Loans, Nominal Value, % of GDP"                                                    
## [127,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, Loans, Nominal Value, % of GDP"                                                     
## [128,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, Loans, Nominal Value, % of GDP"                                                               
## [129,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, Loans, Nominal Value, % of GDP"                                                     
## [130,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, Loans, Nominal Value, % of GDP"                                                               
## [131,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, Loans, Nominal Value, % of GDP"                                                  
## [132,] "Gross PSD, Nonfinancial Public Corp., All maturities, Loans, Nominal Value, % of GDP"                                                                                     
## [133,] "Gross PSD, Budgetary Central Gov., All maturities, Other accounts payable, Nominal Value, % of GDP"                                                                       
## [134,] "Gross PSD, Central Gov., All maturities, Other accounts payable, Nominal Value, % of GDP"                                                                                 
## [135,] "Gross PSD, Financial Public Corp., All maturities, Other accounts payable, Nominal Value, % of GDP"                                                                       
## [136,] "Gross PSD, General Gov., All maturities, Other accounts payable, Nominal Value, % of GDP"                                                                                 
## [137,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in one year or less, Other accounts payable, Nominal Value, % of GDP"                                      
## [138,] "Gross PSD, Central Gov., Long-term, With payment due in one year or less, Other accounts payable, Nominal Value, % of GDP"                                                
## [139,] "Gross PSD, Financial Public Corp., Long-term, With payment due in one year or less, Other accounts payable, Nominal Value, % of GDP"                                      
## [140,] "Gross PSD, General Gov., Long-term, With payment due in one year or less, Other accounts payable, Nominal Value, % of GDP"                                                
## [141,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in one year or less, Other accounts payable, Nominal Value, % of GDP"                                   
## [142,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, Other accounts payable, Nominal Value, % of GDP"                                    
## [143,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, Other accounts payable, Nominal Value, % of GDP"                                              
## [144,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, Other accounts payable, Nominal Value, % of GDP"                                    
## [145,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, Other accounts payable, Nominal Value, % of GDP"                                              
## [146,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, Other accounts payable, Nominal Value, % of GDP"                                 
## [147,] "Gross PSD, Nonfinancial Public Corp., All maturities, Other accounts payable, Nominal Value, % of GDP"                                                                    
## [148,] "Gross PSD, Budgetary Central Gov., All maturities, Special Drawing Rights, Nominal Value, % of GDP"                                                                       
## [149,] "Gross PSD, Central Gov., All maturities, Special Drawing Rights, Nominal Value, % of GDP"                                                                                 
## [150,] "Gross PSD, Financial Public Corp., All maturities, Special Drawing Rights, Nominal Value, % of GDP"                                                                       
## [151,] "Gross PSD, General Gov., All maturities, Special Drawing Rights, Nominal Value, % of GDP"                                                                                 
## [152,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, Special Drawing Rights, Nominal Value, % of GDP"                                    
## [153,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, Special Drawing Rights, Nominal Value, % of GDP"                                              
## [154,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, Special Drawing Rights, Nominal Value, % of GDP"                                    
## [155,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, Special Drawing Rights, Nominal Value, % of GDP"                                              
## [156,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, Special Drawing Rights, Nominal Value, % of GDP"                                 
## [157,] "Gross PSD, Nonfinancial Public Corp., All maturities, Special Drawing Rights, Nominal Value, % of GDP"                                                                    
## [158,] "Gross PSD, Budgetary Central Gov., Long-term, All instruments, Nominal Value, % of GDP"                                                                                   
## [159,] "Gross PSD, Central Gov., Long-term, All instruments, Nominal Value, % of GDP"                                                                                             
## [160,] "Gross PSD, Financial Public Corp., Long-term, All instruments, Nominal Value, % of GDP"                                                                                   
## [161,] "Gross PSD, General Gov., Long-term, All instruments, Nominal Value, % of GDP"                                                                                             
## [162,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in one year or less, All instruments, Nominal Value, % of GDP"                                             
## [163,] "Gross PSD, Central Gov., Long-term, With payment due in one year or less, All instruments, Nominal Value, % of GDP"                                                       
## [164,] "Gross PSD, Financial Public Corp., Long-term, With payment due in one year or less, All instruments, Nominal Value, % of GDP"                                             
## [165,] "Gross PSD, General Gov., Long-term, With payment due in one year or less, All instruments, Nominal Value, % of GDP"                                                       
## [166,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in one year or less, All instruments, Nominal Value, % of GDP"                                          
## [167,] "Gross PSD, Budgetary Central Gov., Long-term, With payment due in more than one year, All instruments, Nominal Value, % of GDP"                                           
## [168,] "Gross PSD, Central Gov., Long-term, With payment due in more than one year, All instruments, Nominal Value, % of GDP"                                                     
## [169,] "Gross PSD, Financial Public Corp., Long-term, With payment due in more than one year, All instruments, Nominal Value, % of GDP"                                           
## [170,] "Gross PSD, General Gov., Long-term, With payment due in more than one year, All instruments, Nominal Value, % of GDP"                                                     
## [171,] "Gross PSD, Nonfinancial Public Corp., Long-term, With payment due in more than one year, All instruments, Nominal Value, % of GDP"                                        
## [172,] "Gross PSD, Nonfinancial Public Corp., Long-term, All instruments, Nominal Value, % of GDP"                                                                                
## [173,] "Gross PSD, Budgetary Central Gov., Short-term, Currency and deposits, Nominal Value, % of GDP"                                                                            
## [174,] "Gross PSD, Central Gov., Short-term, Currency and deposits, Nominal Value, % of GDP"                                                                                      
## [175,] "Gross PSD, Financial Public Corp., Short-term, Currency and deposits, Nominal Value, % of GDP"                                                                            
## [176,] "Gross PSD, General Gov., Short-term, Currency and deposits, Nominal Value, % of GDP"                                                                                      
## [177,] "Gross PSD, Nonfinancial Public Corp., Short-term, Currency and deposits, Nominal Value, % of GDP"                                                                         
## [178,] "Gross PSD, Budgetary Central Gov., Short-term, Debt securities, Nominal Value, % of GDP"                                                                                  
## [179,] "Gross PSD, Central Gov., Short-term, Debt securities, Nominal Value, % of GDP"                                                                                            
## [180,] "Gross PSD, Financial Public Corp., Short-term, Debt securities, Nominal Value, % of GDP"                                                                                  
## [181,] "Gross PSD, General Gov., Short-term, Debt securities, Nominal Value, % of GDP"                                                                                            
## [182,] "Gross PSD, Nonfinancial Public Corp., Short-term, Debt securities, Nominal Value, % of GDP"                                                                               
## [183,] "Gross PSD, Budgetary Central Gov., Short-term, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                          
## [184,] "Gross PSD, Central Gov., Short-term, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                                    
## [185,] "Gross PSD, Financial Public Corp., Short-term, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                          
## [186,] "Gross PSD, General Gov., Short-term, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                                    
## [187,] "Gross PSD, Nonfinancial Public Corp., Short-term, Insurance, pensions, and standardized guarantee schemes, Nominal Value, % of GDP"                                       
## [188,] "Gross PSD, Budgetary Central Gov., Short-term, Loans, Nominal Value, % of GDP"                                                                                            
## [189,] "Gross PSD, Central Gov., Short-term, Loans, Nominal Value, % of GDP"                                                                                                      
## [190,] "Gross PSD, Financial Public Corp., Short-term, Loans, Nominal Value, % of GDP"                                                                                            
## [191,] "Gross PSD, General Gov., Short-term, Loans, Nominal Value, % of GDP"                                                                                                      
## [192,] "Gross PSD, Nonfinancial Public Corp., Short-term, Loans, Nominal Value, % of GDP"                                                                                         
## [193,] "Gross PSD, Budgetary Central Gov., Short-term, Other accounts payable, Nominal Value, % of GDP"                                                                           
## [194,] "Gross PSD, Central Gov., Short-term, Other accounts payable, Nominal Value, % of GDP"                                                                                     
## [195,] "Gross PSD, Financial Public Corp., Short-term, Other accounts payable, Nominal Value, % of GDP"                                                                           
## [196,] "Gross PSD, General Gov., Short-term, Other accounts payable, Nominal Value, % of GDP"                                                                                     
## [197,] "Gross PSD, Nonfinancial Public Corp., Short-term, Other accounts payable, Nominal Value, % of GDP"                                                                        
## [198,] "Gross PSD, Budgetary Central Gov., Short-term, All instruments, Nominal Value, % of GDP"                                                                                  
## [199,] "Gross PSD, Central Gov., Short-term, All instruments, Nominal Value, % of GDP"                                                                                            
## [200,] "Gross PSD, Financial Public Corp., Short-term, All instruments, Nominal Value, % of GDP"                                                                                  
## [201,] "Gross PSD, General Gov., Short-term, All instruments, Nominal Value, % of GDP"                                                                                            
## [202,] "Gross PSD, Nonfinancial Public Corp., Short-term, All instruments, Nominal Value, % of GDP"                                                                               
## [203,] "Debt on Concessional terms to GDP (% of GDP)"                                                                                                                             
## [204,] "Debt on Non-concessional terms to GDP (% of GDP)"                                                                                                                         
## [205,] "Debt outstanding and disbursed, Total to GDP (% of GDP)"                                                                                                                  
## [206,] "Net ODA received (% of GDP)"                                                                                                                                              
## [207,] "Net ODA received from DAC donors (% of recipient's GDP)"                                                                                                                  
## [208,] "Net ODA received from multilateral donors (% of GDP)"                                                                                                                     
## [209,] "Net ODA received from non-DAC bilateral donors (% of GDP)"                                                                                                                
## [210,] "Net ODA received (% of GDP)"                                                                                                                                              
## [211,] "Total debt service (% of GDP)"                                                                                                                                            
## [212,] "Energy intensity level of primary energy (MJ/$2011 PPP GDP)"                                                                                                              
## [213,] "GDP per unit of energy use (1987 US$ per kg of oil equivalent)"                                                                                                           
## [214,] "GDP per unit of energy use (2000 US$ per kg of oil equivalent)"                                                                                                           
## [215,] "GDP per unit of energy use (PPP $ per kg of oil equivalent)"                                                                                                              
## [216,] "GDP per unit of energy use (constant 2017 PPP $ per kg of oil equivalent)"                                                                                                
## [217,] "Energy use (kg of oil equivalent) per $1,000 GDP (constant 2017 PPP)"                                                                                                     
## [218,] "CO2 emissions, industrial (kg per 1987 US$ of GDP)"                                                                                                                       
## [219,] "CO2 emissions, industrial (kg per 1987 US$ of GDP)"                                                                                                                       
## [220,] "CO2 emissions (kg per 2010 US$ of GDP)"                                                                                                                                   
## [221,] "CO2 emissions (kg per PPP $ of GDP)"                                                                                                                                      
## [222,] "CO2 emissions (kg per 2017 PPP $ of GDP)"                                                                                                                                 
## [223,] "Water productivity, total (constant 2010 US$ GDP per cubic meter of total freshwater withdrawal)"                                                                         
## [224,] "GDP per unit of energy use (1987 US$ per kg of oil equivalent)"                                                                                                           
## [225,] "Deposit insurance coverage (% of GDP per capita)"                                                                                                                         
## [226,] "Domestic credit to private sector by banks (% of GDP)"                                                                                                                    
## [227,] "Total reserves includes gold (% of GDP)"                                                                                                                                  
## [228,] "Claims on governments and other public entities (% of GDP)"                                                                                                               
## [229,] "Broad money (% of GDP)"                                                                                                                                                   
## [230,] "Money and quasi money (M2) as % of GDP"                                                                                                                                   
## [231,] "Money and quasi money (M2) as % of GDP"                                                                                                                                   
## [232,] "Income velocity of money (GDP/M2)"                                                                                                                                        
## [233,] "Quasi-liquid liabilities (% of GDP)"                                                                                                                                      
## [234,] "Seignorage (% of GDP)"                                                                                                                                                    
## [235,] "Claims on central government, etc. (% GDP)"                                                                                                                               
## [236,] "Claims on other sectors of the domestic economy (% of GDP)"                                                                                                               
## [237,] "Domestic credit provided by financial sector (% of GDP)"                                                                                                                  
## [238,] "Domestic credit provided by banking sector (% of GDP)"                                                                                                                    
## [239,] "Domestic credit to private sector (% of GDP)"                                                                                                                             
## [240,] "Credit to private sector (% of GDP)"                                                                                                                                      
## [241,] "Liquid liabilities (M3) as % of GDP"                                                                                                                                      
## [242,] "Liquid liabilities (M3) as % of GDP"                                                                                                                                      
## [243,] "Quasi-liquid liabilities (% of GDP)"                                                                                                                                      
## [244,] "Overall budget balance, including grants (% of GDP)"                                                                                                                      
## [245,] "Overall budget deficit, including grants (% of GDP)"                                                                                                                      
## [246,] "Central government debt, total (% of GDP)"                                                                                                                                
## [247,] "Central government debt, total (% of GDP)"                                                                                                                                
## [248,] "Financing from abroad (% of GDP)"                                                                                                                                         
## [249,] "Financing from abroad (% of GDP)"                                                                                                                                         
## [250,] "Domestic financing, total (% of GDP)"                                                                                                                                     
## [251,] "Domestic finanacing (% of GDP)"                                                                                                                                           
## [252,] "Current revenue, excluding grants (% of GDP)"                                                                                                                             
## [253,] "Current revenue (% of GDP)"                                                                                                                                               
## [254,] "Central government revenues, excluding all grants (% of GDP)"                                                                                                             
## [255,] "Current revenue, excluding grants (% of GDP)"                                                                                                                             
## [256,] "SOE external debt (% of GDP)"                                                                                                                                             
## [257,] "State-owned enterprises, economic activity (% of GDP)"                                                                                                                    
## [258,] "SOE economic activity (% of GDP)"                                                                                                                                         
## [259,] "State-owned enterprises, net financial flows from government (% of GDP)"                                                                                                  
## [260,] "SOE net financial flows from government (% of GDP)"                                                                                                                       
## [261,] "State-owned enterprises, overall balance before transfers (% of GDP)"                                                                                                     
## [262,] "Tax revenue (% of GDP)"                                                                                                                                                   
## [263,] "Tax revenue (% of GDP)"                                                                                                                                                   
## [264,] "Defense expenditure (% of GDP)"                                                                                                                                           
## [265,] "Research and development expenditure (% of GDP)"                                                                                                                          
## [266,] "Expenditure, total (% of GDP)"                                                                                                                                            
## [267,] "Total expenditure (% of GDP)"                                                                                                                                             
## [268,] "Net acquisition of financial assets (% of GDP)"                                                                                                                           
## [269,] "Cash surplus/deficit (% of GDP)"                                                                                                                                          
## [270,] "Central government debt, total (% of GDP)"                                                                                                                                
## [271,] "Net incurrence of liabilities, domestic (% of GDP)"                                                                                                                       
## [272,] "Net incurrence of liabilities, foreign (% of GDP)"                                                                                                                        
## [273,] "Net incurrence of liabilities, total (% of GDP)"                                                                                                                          
## [274,] "Net investment in nonfinancial assets (% of GDP)"                                                                                                                         
## [275,] "Net lending (+) / net borrowing (-) (% of GDP)"                                                                                                                           
## [276,] "Revenue, excluding grants (% of GDP)"                                                                                                                                     
## [277,] "Tax revenue (% of GDP)"                                                                                                                                                   
## [278,] "Expense (% of GDP)"                                                                                                                                                       
## [279,] "Expenditure shares of GDP (percentage share, GDP=100, XR term)"                                                                                                           
## [280,] "Private credit by deposit money banks to GDP (%)"                                                                                                                         
## [281,] "Deposit money banks'' assets to GDP (%)"                                                                                                                                  
## [282,] "Nonbank financial institutions’ assets to GDP (%)"                                                                                                                        
## [283,] "Liquid liabilities to GDP (%)"                                                                                                                                            
## [284,] "Central bank assets to GDP (%)"                                                                                                                                           
## [285,] "Mutual fund assets to GDP (%)"                                                                                                                                            
## [286,] "Financial system deposits to GDP (%)"                                                                                                                                     
## [287,] "Life insurance premium volume to GDP (%)"                                                                                                                                 
## [288,] "Non-life insurance premium volume to GDP (%)"                                                                                                                             
## [289,] "Insurance company assets to GDP (%)"                                                                                                                                      
## [290,] "Private credit by deposit money banks and other financial institutions to GDP (%)"                                                                                        
## [291,] "Pension fund assets to GDP (%)"                                                                                                                                           
## [292,] "Domestic credit to private sector (% of GDP)"                                                                                                                             
## [293,] "Stock market capitalization to GDP (%)"                                                                                                                                   
## [294,] "Stock market total value traded to GDP (%)"                                                                                                                               
## [295,] "Outstanding domestic private debt securities to GDP (%)"                                                                                                                  
## [296,] "Outstanding domestic public debt securities to GDP (%)"                                                                                                                   
## [297,] "Outstanding international private debt securities to GDP (%)"                                                                                                             
## [298,] "Outstanding international public debt securities to GDP (%)"                                                                                                              
## [299,] "International debt issues to GDP (%)"                                                                                                                                     
## [300,] "Gross portfolio equity liabilities to GDP (%)"                                                                                                                            
## [301,] "Gross portfolio equity assets to GDP (%)"                                                                                                                                 
## [302,] "Gross portfolio debt liabilities to GDP (%)"                                                                                                                              
## [303,] "Gross portfolio debt assets to GDP (%)"                                                                                                                                   
## [304,] "Syndicated loan issuance volume to GDP (%)"                                                                                                                               
## [305,] "Corporate bond issuance volume to GDP (%)"                                                                                                                                
## [306,] "Credit to government and state-owned enterprises to GDP (%)"                                                                                                              
## [307,] "Bank deposits to GDP (%)"                                                                                                                                                 
## [308,] "Loans from nonresident banks (net) to GDP (%)"                                                                                                                            
## [309,] "Loans from nonresident banks (amounts outstanding) to GDP (%)"                                                                                                            
## [310,] "Remittance inflows to GDP (%)"                                                                                                                                            
## [311,] "Consolidated foreign claims of BIS reporting banks to GDP (%)"                                                                                                            
## [312,] "Global leasing volume to GDP (%)"                                                                                                                                         
## [313,] "Total factoring volume to GDP (%)"                                                                                                                                        
## [314,] "Information and communication technology expenditure (% of GDP)"                                                                                                          
## [315,] "Railways, goods transported (ton-km per PPP $ million of GDP)"                                                                                                            
## [316,] "Railways, passenger-km (per PPP $ million of GDP)"                                                                                                                        
## [317,] "Telecommunications revenue (% GDP)"                                                                                                                                       
## [318,] "Military expenditure (% of GDP)"                                                                                                                                          
## [319,] "GDP on Accommodation & Food Beverages Activity Sector (in IDR Million), SNA 2008, Current Price"                                                                          
## [320,] "GDP on Accommodation & Food Beverages Activity Sector (in IDR Million), SNA 2008, Constant Price"                                                                         
## [321,] "GDP on Agriculture Sector (in IDR Million), Current Price"                                                                                                                
## [322,] "GDP on Agriculture Sector (in IDR Million), Constant Price"                                                                                                               
## [323,] "GDP on Agriculture, Forestry & Fisheries Sector (in IDR Million), SNA 2008, Current Price"                                                                                
## [324,] "GDP on Agriculture, Forestry & Fisheries Sector (in IDR Million), SNA 2008, Constant Price"                                                                               
## [325,] "GDP on Business Services Sector (in IDR Million), SNA 2008, Current Price"                                                                                                
## [326,] "GDP on Business Services Sector (in IDR Million), SNA 2008, Constant Price"                                                                                               
## [327,] "GDP on Construction Sector (in IDR Million), Current Price"                                                                                                               
## [328,] "GDP on Construction Sector (in IDR Million), Constant Price"                                                                                                              
## [329,] "GDP on Construction Sector (in IDR Million), SNA 2008, Current Price"                                                                                                     
## [330,] "GDP on Construction Sector (in IDR Million), SNA 2008, Constant Price"                                                                                                    
## [331,] "GDP on Education Services Sector (in IDR Million), SNA 2008, Current Price"                                                                                               
## [332,] "GDP on Education Services Sector (in IDR Million), SNA 2008, Constant Price"                                                                                              
## [333,] "GDP on Electricity & Gas Supply Sector (in IDR Million), SNA 2008, Current Price"                                                                                         
## [334,] "GDP on Electricity & Gas Supply Sector (in IDR Million), SNA 2008, Constant Price"                                                                                        
## [335,] "Total GDP excluding Oil and Gas (in IDR Million), Current Price"                                                                                                          
## [336,] "Total GDP excluding Oil and Gas (in IDR Million), Constant Price"                                                                                                         
## [337,] "GDP on Financial Service Sector (in IDR Million), Current Price"                                                                                                          
## [338,] "GDP on Financial Service Sector (in IDR Million), Constant Price"                                                                                                         
## [339,] "GDP on Financial & Insurance Activity Sector (in IDR Million), SNA 2008, Current Price"                                                                                   
## [340,] "GDP on Financial & Insurance Activity Sector (in IDR Million), SNA 2008, Constant Price"                                                                                  
## [341,] "GDP on Human Health & Social Work Activity Sector (in IDR Million), SNA 2008, Current Price"                                                                              
## [342,] "GDP on Human Health & Social Work Activity Sector (in IDR Million), SNA 2008, Constant Price"                                                                             
## [343,] "Total GDP including Oil and Gas (in IDR Million), Current Price"                                                                                                          
## [344,] "Total GDP including Oil and Gas (in IDR Million), Constant Price"                                                                                                         
## [345,] "Total GDP including Oil and Gas (in IDR Million), SNA 2008, Current Price"                                                                                                
## [346,] "Total GDP including Oil and Gas (in IDR Million), SNA 2008, Constant Price"                                                                                               
## [347,] "GDP on Information & Communication Sector (in IDR Million), SNA 2008, Current Price"                                                                                      
## [348,] "GDP on Information & Communication Sector (in IDR Million), SNA 2008, Constant Price"                                                                                     
## [349,] "GDP on Mining and Quarrying Sector (in IDR Million), Current Price"                                                                                                       
## [350,] "GDP on Mining and Quarrying Sector (in IDR Million), Constant Price"                                                                                                      
## [351,] "GDP on Mining & Quarrying Sector (in IDR Million), SNA 2008, Current Price"                                                                                               
## [352,] "GDP on Mining & Quarrying Sector (in IDR Million), SNA 2008, Constant Price"                                                                                              
## [353,] "GDP on Manufacturing Sector (in IDR Million), Current Price"                                                                                                              
## [354,] "GDP on Manufacturing Sector (in IDR Million), Constant Price"                                                                                                             
## [355,] "GDP on Manufacturing Industry Sector (in IDR Million), SNA 2008, Current Price"                                                                                           
## [356,] "GDP on Manufacturing Industry Sector (in IDR Million), SNA 2008, Constant Price"                                                                                          
## [357,] "GDP on Public Administration, Defense & Compulsory Social Security Sector (in IDR Million), SNA 2008, Current Price"                                                      
## [358,] "GDP on Public Administration, Defense & Compulsory Social Security Sector (in IDR Million), SNA 2008, Constant Price"                                                     
## [359,] "GDP on Real Estate Sector (in IDR Million), SNA 2008, Current Price"                                                                                                      
## [360,] "GDP on Real Estate Sector (in IDR Million), SNA 2008, Constant Price"                                                                                                     
## [361,] "GDP on Other Service Sector (in IDR Million), Current Price"                                                                                                              
## [362,] "GDP on Other Service Sector (in IDR Million), Constant Price"                                                                                                             
## [363,] "GDP on Other Services Sector (in IDR Million), SNA 2008, Current Price"                                                                                                   
## [364,] "GDP on Other Services Sector (in IDR Million), SNA 2008, Constant Price"                                                                                                  
## [365,] "GDP on Transportation and Telecommunication Sector (in IDR Million), Current Price"                                                                                       
## [366,] "GDP on Transportation and Telecommunication Sector (in IDR Million), Constant Price"                                                                                      
## [367,] "GDP on Transportation & Storage Sector (in IDR Million), SNA 2008, Current Price"                                                                                         
## [368,] "GDP on Transportation & Storage Sector (in IDR Million), SNA 2008, Constant Price"                                                                                        
## [369,] "GDP on Trade, Hotel and Restaurant Sector (in IDR Million), Current Price"                                                                                                
## [370,] "GDP on Trade, Hotel and Restaurant Sector (in IDR Million), Constant Price"                                                                                               
## [371,] "GDP on Wholesales & Retail Trade, Repair of Motor Vehicles & Motorcycles Sector (in IDR Million), SNA 2008, Current Price"                                                
## [372,] "GDP on Wholesales & Retail Trade, Repair of Motor Vehicles & Motorcycles Sector (in IDR Million), SNA 2008, Constant Price"                                               
## [373,] "GDP on Utilities Sector (in IDR Million), Current Price"                                                                                                                  
## [374,] "GDP on Utilities Sector (in IDR Million), Constant Price"                                                                                                                 
## [375,] "GDP on Water Supply, Sewerage, Waste & Recycling Management Sector (in IDR Million), SNA 2008, Current Price"                                                             
## [376,] "GDP on Water Supply, Sewerage, Waste & Recycling Management Sector (in IDR Million), SNA 2008, Constant Price"                                                            
## [377,] "General government final consumption expenditure (% of GDP)"                                                                                                              
## [378,] "Household final consumption expenditure, etc. (% of GDP)"                                                                                                                 
## [379,] "Households and NPISHs final consumption expenditure (% of GDP)"                                                                                                           
## [380,] "Final consumption expenditure, etc. (% of GDP)"                                                                                                                           
## [381,] "Total consumption: contribution to growth of GDP (%)"                                                                                                                     
## [382,] "Final consumption expenditure (% of GDP)"                                                                                                                                 
## [383,] "Gross national expenditure (% of GDP)"                                                                                                                                    
## [384,] "Exports of goods and services (% of GDP)"                                                                                                                                 
## [385,] "GDP expenditure on general government consumption (in IDR Million)"                                                                                                       
## [386,] "GDP expenditure on general government consumption (in IDR Million), SNA 2008, Current Price"                                                                              
## [387,] "GDP expenditure on non profit private institution consumption (in IDR Million)"                                                                                           
## [388,] "GDP expenditure on non profit private institution consumption (in IDR Million), SNA 2008, Current Price"                                                                  
## [389,] "GDP expenditure on private consumption (in IDR Million)"                                                                                                                  
## [390,] "GDP expenditure on private consumption (in IDR Million), SNA 2008, Current Price"                                                                                         
## [391,] "GDP expenditure on exports (in IDR Million)"                                                                                                                              
## [392,] "GDP expenditure on exports (in IDR Million), SNA 2008, Current Price"                                                                                                     
## [393,] "Gross fixed capital formation, private sector (% of GDP)"                                                                                                                 
## [394,] "Gross public investment (% of GDP)"                                                                                                                                       
## [395,] "GDP expenditure on gross fixed capital formation (in IDR Million)"                                                                                                        
## [396,] "GDP expenditure on gross fixed capital formation (in IDR Million), SNA 2008, Current Price"                                                                               
## [397,] "Gross fixed capital formation (% of GDP)"                                                                                                                                 
## [398,] "GDP expenditure on imports (in IDR Million)"                                                                                                                              
## [399,] "GDP expenditure on imports (in IDR Million), SNA 2008, Current Price"                                                                                                     
## [400,] "GDP expenditure on inter-region net exports (in IDR Million), SNA 2008, Current Price"                                                                                    
## [401,] "GDP expenditure on changes in stock (in IDR Million)"                                                                                                                     
## [402,] "GDP expenditure on changes in stock (in IDR Million), SNA 2008, Current Price"                                                                                            
## [403,] "Total GDP based on expenditure (in IDR Million)"                                                                                                                          
## [404,] "Total GDP based on expenditure (in IDR Million), SNA 2008, Current Price"                                                                                                 
## [405,] "Gross domestic investment: contr. to growth of GDP(%)"                                                                                                                    
## [406,] "Gross capital formation (% of GDP)"                                                                                                                                       
## [407,] "Imports of goods and services (% of GDP)"                                                                                                                                 
## [408,] "Merchandise trade to GDP ratio (%)"                                                                                                                                       
## [409,] "Resource balance: contribution to growth of GDP (%)"                                                                                                                      
## [410,] "External balance on goods and services (% of GDP)"                                                                                                                        
## [411,] "Trade (% of GDP)"                                                                                                                                                         
## [412,] "Agriculture: contribution to growth of GDP (%)"                                                                                                                           
## [413,] "Industry: contribution to growth of GDP (%)"                                                                                                                              
## [414,] "Services: contribution to growth of GDP (%)"                                                                                                                              
## [415,] "Real agricultural GDP per capita growth rate (%)"                                                                                                                         
## [416,] "Real agricultural GDP growth rates (%)"                                                                                                                                   
## [417,] "Agriculture, forestry, and fishing, value added (% of GDP)"                                                                                                               
## [418,] "Manufacturing, value added (% of GDP)"                                                                                                                                    
## [419,] "Industry: contribution to growth of GDP (%)"                                                                                                                              
## [420,] "Industry (including construction), value added (% of GDP)"                                                                                                                
## [421,] "Discrepancy in GDP, value added (current US$)"                                                                                                                            
## [422,] "Discrepancy in GDP, value added (current LCU)"                                                                                                                            
## [423,] "Discrepancy in GDP, value added (constant LCU)"                                                                                                                           
## [424,] "Services: contribution to growth of GDP (%)"                                                                                                                              
## [425,] "Services, etc., value added (% of GDP)"                                                                                                                                   
## [426,] "Services, value added (% of GDP)"                                                                                                                                         
## [427,] "Agricultural support estimate (% of GDP)"                                                                                                                                 
## [428,] "Coal rents (% of GDP)"                                                                                                                                                    
## [429,] "Inflation, GDP deflator (annual %)"                                                                                                                                       
## [430,] "Inflation, GDP deflator (annual %)"                                                                                                                                       
## [431,] "Inflation, GDP deflator: linked series (annual %)"                                                                                                                        
## [432,] "GDP deflator (base year varies by country)"                                                                                                                               
## [433,] "GDP deflator (1987 = 100)"                                                                                                                                                
## [434,] "GDP deflator: linked series (base year varies by country)"                                                                                                                
## [435,] "Discrepancy in expenditure estimate of GDP (current US$)"                                                                                                                 
## [436,] "Discrepancy in expenditure estimate of GDP (current LCU)"                                                                                                                 
## [437,] "Discrepancy in expenditure estimate of GDP (constant LCU)"                                                                                                                
## [438,] "GDP at factor cost (constant 1987 US$)"                                                                                                                                   
## [439,] "GDP at factor cost (constant 1987 LCU)"                                                                                                                                   
## [440,] "Forest rents (% of GDP)"                                                                                                                                                  
## [441,] "Mineral rents (% of GDP)"                                                                                                                                                 
## [442,] "GDP (current US$)"                                                                                                                                                        
## [443,] "GDP deflator, index (2000=100; US$ series)"                                                                                                                               
## [444,] "GDP (current LCU)"                                                                                                                                                        
## [445,] "GDP: linked series (current LCU)"                                                                                                                                         
## [446,] "GDP deflator, period average (LCU index 2000=100)"                                                                                                                        
## [447,] "GDP Deflator"                                                                                                                                                             
## [448,] "GDP (constant 2010 US$)"                                                                                                                                                  
## [449,] "GDP at market prices (constant 1987 US$)"                                                                                                                                 
## [450,] "GDP growth (annual %)"                                                                                                                                                    
## [451,] "GDP (constant LCU)"                                                                                                                                                       
## [452,] "GDP at market prices (constant 1987 LCU)"                                                                                                                                 
## [453,] "GDP growth (annual %)"                                                                                                                                                    
## [454,] "GDP, PPP (current international $)"                                                                                                                                       
## [455,] "GDP, PPP (constant 2017 international $)"                                                                                                                                 
## [456,] "GDP, PPP (constant 1987 international $)"                                                                                                                                 
## [457,] "GDP deflator (1987=100,Index)"                                                                                                                                            
## [458,] "GDP deflator, end period (base year varies by country)"                                                                                                                   
## [459,] "Natural gas rents (% of GDP)"                                                                                                                                             
## [460,] "GDP per capita (current US$)"                                                                                                                                             
## [461,] "GDP per capita (current LCU)"                                                                                                                                             
## [462,] "GDP per capita (constant 2010 US$)"                                                                                                                                       
## [463,] "GDP per capita growth (annual %)"                                                                                                                                         
## [464,] "GDP per capita (constant LCU)"                                                                                                                                            
## [465,] "GDP per capita, PPP (current international $)"                                                                                                                            
## [466,] "GDP per capita, PPP (constant 2017 international $)"                                                                                                                      
## [467,] "GDP per capita, PPP (constant 1987 international $)"                                                                                                                      
## [468,] "GDP per capita, PPP annual growth (%)"                                                                                                                                    
## [469,] "Oil rents (% of GDP)"                                                                                                                                                     
## [470,] "Total natural resources rents (% of GDP)"                                                                                                                                 
## [471,] "Gross domestic savings (% of GDP)"                                                                                                                                        
## [472,] "Genuine savings: education expenditure (% of GDP)"                                                                                                                        
## [473,] "Genuine savings: carbon dioxide damage (% of GDP)"                                                                                                                        
## [474,] "Genuine savings: net forest depletion (% of GDP)"                                                                                                                         
## [475,] "Genuine savings: consumption of fixed capital (% of GDP)"                                                                                                                 
## [476,] "Genuine savings: mineral depletion (% of GDP)"                                                                                                                            
## [477,] "Genuine savings: energy depletion (% of GDP)"                                                                                                                             
## [478,] "Genuine savings: net domestic savings (% of GDP)"                                                                                                                         
## [479,] "Genuine domestic savings (% of GDP)"                                                                                                                                      
## [480,] "Gross savings (% of GDP)"                                                                                                                                                 
## [481,] "Annual percentage growth rate of GDP at market prices based on constant 2010 US Dollars."                                                                                 
## [482,] "GDP,current US$,millions,seas. adj.,"                                                                                                                                     
## [483,] "GDP,current LCU,millions,seas. adj.,"                                                                                                                                     
## [484,] "GDP,constant 2010 US$,millions,seas. adj.,"                                                                                                                               
## [485,] "GDP,constant 2010 LCU,millions,seas. adj.,"                                                                                                                               
## [486,] "PPP conversion factor, GDP (LCU per international $)"                                                                                                                     
## [487,] "2005 PPP conversion factor, GDP (LCU per international $)"                                                                                                                
## [488,] "Price level ratio of PPP conversion factor (GDP) to market exchange rate"                                                                                                 
## [489,] "EXPENDITURE SHARES (GDP=100)"                                                                                                                                             
## [490,] "Public Expenditure on Education  (% GDP)"                                                                                                                                 
## [491,] "Public spending on education, primary (% of GDP)"                                                                                                                         
## [492,] "Government expenditure per student, primary (% of GDP per capita)"                                                                                                        
## [493,] "Public spending on education, secondary (% of GDP)"                                                                                                                       
## [494,] "Government expenditure per student, secondary (% of GDP per capita)"                                                                                                      
## [495,] "Public spending on education, tertiary (% of GDP)"                                                                                                                        
## [496,] "Government expenditure per student, tertiary (% of GDP per capita)"                                                                                                       
## [497,] "Government expenditure on education, total (% of GDP)"                                                                                                                    
## [498,] "Rail traffic (km per million US$ GDP)"                                                                                                                                    
## [499,] "Current health expenditure (% of GDP)"                                                                                                                                    
## [500,] "Domestic general government health expenditure (% of GDP)"                                                                                                                
## [501,] "Public Expenditure on Health (% GDP)"                                                                                                                                     
## [502,] "Capital health expenditure (% of GDP)"                                                                                                                                    
## [503,] "Health expenditure, private (% of GDP)"                                                                                                                                   
## [504,] "Health expenditure, public (% of GDP)"                                                                                                                                    
## [505,] "Health expenditure, total (% of GDP)"                                                                                                                                     
## [506,] "GDP per person employed (constant 2017 PPP $)"                                                                                                                            
## [507,] "GDP per person employed (annual % growth)"                                                                                                                                
## [508,] "GDP per person employed, index (1980 = 100)"                                                                                                                              
## [509,] "Trade (% of GDP, PPP)"                                                                                                                                                    
## [510,] "Merchandise trade (% of GDP)"                                                                                                                                             
## [511,] "Trade in goods (% of goods GDP)"                                                                                                                                          
## [512,] "Government expenditure on pre-primary education as % of GDP (%)"                                                                                                          
## [513,] "Initial government funding of pre-primary education as a percentage of GDP (%)"                                                                                           
## [514,] "Government expenditure on primary education as % of GDP (%)"                                                                                                              
## [515,] "Initial government funding of primary education as a percentage of GDP (%)"                                                                                               
## [516,] "Initial household funding of primary education as a percentage of GDP"                                                                                                    
## [517,] "Government expenditure on lower secondary education as a percentage of GDP (%)"                                                                                           
## [518,] "Initial government funding of lower secondary education as a percentage of GDP (%)"                                                                                       
## [519,] "Government expenditure on secondary education as % of GDP (%)"                                                                                                            
## [520,] "Initial household funding of secondary education as a percentage of GDP"                                                                                                  
## [521,] "Initial government funding of secondary education as a percentage of GDP (%)"                                                                                             
## [522,] "Government expenditure on secondary and post-secondary non-tertiary vocational education as % of GDP (%)"                                                                 
## [523,] "Government expenditure on upper secondary education as a percentage of GDP (%)"                                                                                           
## [524,] "Initial government funding of upper secondary education as a percentage of GDP (%)"                                                                                       
## [525,] "Government expenditure on post-secondary non-tertiary education as % of GDP (%)"                                                                                          
## [526,] "Government expenditure on tertiary education as % of GDP (%)"                                                                                                             
## [527,] "Initial government funding of tertiary education as a percentage of GDP (%)"                                                                                              
## [528,] "Initial household funding of tertiary education as a percentage of GDP"                                                                                                   
## [529,] "Initial government funding of education as a percentage of GDP (%)"                                                                                                       
## [530,] "Initial household funding of education as a percentage of GDP"                                                                                                            
## [531,] "Initial government funding per pre-primary student as a percentage of GDP per capita"                                                                                     
## [532,] "Initial government funding per primary student as a percentage of GDP per capita"                                                                                         
## [533,] "Initial household funding per primary student as a percentage of GDP per capita"                                                                                          
## [534,] "Initial government funding per lower secondary student as a percentage of GDP per capita"                                                                                 
## [535,] "Initial government funding per secondary student as a percentage of GDP per capita"                                                                                       
## [536,] "Initial household funding per secondary student as a percentage of GDP per capita"                                                                                        
## [537,] "Initial government funding per upper secondary student as a percentage of GDP per capita"                                                                                 
## [538,] "Initial government funding per tertiary student as a percentage of GDP per capita"                                                                                        
## [539,] "Initial household funding per tertiary student as a percentage of GDP per capita"

# PIB: Producto Interno Bruto, percápita quiere decir en dólares constantes.

dat = WDI(indicator='NY.GDP.PCAP.KD', country=c('MX','CA','US'), start=1960, end=2012) 

head(dat)

##   iso2c country NY.GDP.PCAP.KD year
## 1    CA  Canada       48785.94 2012
## 2    CA  Canada       48464.50 2011
## 3    CA  Canada       47448.01 2010
## 4    CA  Canada       46540.64 2009
## 5    CA  Canada       48495.20 2008
## 6    CA  Canada       48534.17 2007

tail(dat)

##     iso2c       country NY.GDP.PCAP.KD year
## 154    US United States       20831.39 1965
## 155    US United States       19824.67 1964
## 156    US United States       18999.97 1963
## 157    US United States       18463.01 1962
## 158    US United States       17671.22 1961
## 159    US United States       17562.67 1960

ggplot(dat, aes(year, NY.GDP.PCAP.KD, color=country)) + geom_line() + 
    xlab('Year') + ylab('PIB per capita')

dat2 = WDI(indicator='NY.GDP.PCAP.KD', country=c("EC"), start=1960, end=2012) # para Ecuador es "EC"

head(dat2)

##   iso2c country NY.GDP.PCAP.KD year
## 1    EC Ecuador       5122.180 2012
## 2    EC Ecuador       4921.848 2011
## 3    EC Ecuador       4633.590 2010
## 4    EC Ecuador       4547.509 2009
## 5    EC Ecuador       4596.145 2008
## 6    EC Ecuador       4393.724 2007

ggplot(dat2, aes(year, NY.GDP.PCAP.KD, color=country)) + geom_line() + 
    xlab('Year') + ylab('PIB per capita')

Técnicas de limpieza y calidad de datos

Instructora: Karen Calva

5/8/2021

1. Perfilamiento de datos

1.1 Introducción

1.1.1 Qué es el perfilamiento de datos?

1.1.2 Consideraciones y herramientas

1.1.3 Data Profiling vs. Data Quality

1.1.4 Paquete tidyverse

1.2 Análisis exploratorio

1.2.1 Estructura

1.2.2 Contenido

1.2.3 Relaciones

1.2.4 Patrones

1.2.6 Preparación de datos

1.3 Ejercicio práctico

1.4 Bibliografía

2. Calidad de datos

2.1 Introducción

2.2 Requisitos

2.2.1 Completitud

2.2.2 Unicidad

2.2.3 Temporalidad

2.2.4 Veracidad

2.2.5 Precisión

2.2.6 Consistencia

2.3 Pruebas

2.4 Ejercicio práctico

2.5 Bibliografía

3. Imputación

3.1 Introducción

3.2 Tipos de datos perdidos

3.2.1 Datos perdidos estructuralmente

3.2.1 Datos perdidos completamente al azar (MCAR)

3.2.2 Datos perdidos al azar (MAR)

3.2.3 Datos perdidos no al azar (MNAR)

3.3 Métodos de imputación

3.3.1 No hacer nada

3.3.2 Media y mediana

3.3.3 Moda

3.3.4 Ceros o constantes

3.3.5 Multivariante

3.4 Ejercicio práctico

3.5 Bibliografía

4. Ingeniería de variables

4.1 Introducción

4.2 Enfoques

4.2.1 Variables primitivas

4.2.2 Tiempo de corte

4.2.3 Conocimiento de dominio

4.2.4 Interacción de variables

4.2.5 Variables categóricas dispersas

4.2.6 Variables dummy

4.2.7 Variables inutilizadas o redundantes

4.2.8 Problemas de variables categóricas

4.3 Ejercicio práctico

4.4 Bibliografía

5. Herramientas de R para perfilar los datos

5.1 autoEDA - Automated exploratory data analysis

5.1.1 Ejemplo para análisis univariado:

5.1.2 Ejemplo para regresión bivariada::

5.1.3 Ejemplo para clasificación bivariada:

5.2 arsenal - Statistical reporting easy

5.2.1 Resumir variables a partir de categórica (s)

5.2.2 Resumir variables por puntos de tiempo

5.2.3 Ajustar y resumir modelos

5.2.4 Comparar dos tablas

5.3 DataExplorer - Exploratory Data Analysis (EDA)

5.4 janitor - limpiar datos sucios

6. Extracción de información

6.1 Banco Central del Ecuador

6.1 Banco Mundial