TeorÃa
La librerÃa Data Explorer es la la mas conocida para el
análisis exploratorio. Es muy simple de usar y muy poderosa, pues ofrece
como salida un informe con mucha información.
La función para crear el informe es create_reporte, y para
ver cada gráfica de forma individual, las funciones son:
- introduce()
- plot_intro()
- plto_boxplot()
- plot_missing()
- plot_histogram()
- plot_bar()
- plot_correlation()
Instalar parquetes y llamar librerÃas
# install.packages("DataExplorer")
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.3.2
#install.packages("nycflights13")
library(nycflights13)
Contexto
El paquete nycflights13 contiene información sobre todos los buelos
que partieron desde Nueva York (EWR, JFK, LGA) a destinos en los Estados
Unidos en 2013. Fueron 336,776 vuelos en totaL.
Las tablas de este paquete y sus relaciones son las siguientes:

flights <- flights
weather <- weather
planes <- planes
airports <- airports
airlines <- airlines
df <- merge(flights, airlines, by = "carrier")
df <- merge(df, planes, by = "tailnum")
Contexto
# create_report(df)
introduce(df)
## rows columns discrete_columns continuous_columns all_missing_columns
## 1 284170 28 10 18 0
## total_missing_values complete_rows total_observations memory_usage
## 1 311768 920 7956760 50225296
plot_intro(df)

#plot_boxplot(df)
plot_missing(df)

plot_histogram(df)


plot_bar(df)
## 4 columns ignored with more than 50 categories.
## tailnum: 3322 categories
## dest: 104 categories
## time_hour: 6934 categories
## model: 127 categories

plot_correlation(df)
## 5 features with more than 20 categories ignored!
## tailnum: 3322 categories
## dest: 104 categories
## time_hour: 6934 categories
## manufacturer: 35 categories
## model: 127 categories
## Warning in cor(x = structure(list(year.x = c(2013L, 2013L, 2013L, 2013L, : the
## standard deviation is zero

LS0tCnRpdGxlOiAiRGF0YSBFeHBsb3JlciIKYXV0aG9yOiAiR2lsYmVydG8gTWVuY2hhY2EiCmRhdGU6ICIyMDIzLTAyLTI3IgpvdXRwdXQ6IAogaHRtbF9kb2N1bWVudDoKICAgIHRvYzogdHJ1ZQogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlIAogICAgdGhlbWU6IGRhcmsKLS0tCgojIDxzcGFuIHN0eWxlPSJjb2xvcjogeWVsbG93OyI+VGVvcsOtYSA8L3NwYW4+CkxhIGxpYnJlcsOtYSAqRGF0YSBFeHBsb3JlciogZXMgbGEgbGEgbWFzIGNvbm9jaWRhIHBhcmEgZWwgYW7DoWxpc2lzIGV4cGxvcmF0b3Jpby4gRXMgbXV5IHNpbXBsZSBkZSB1c2FyIHkgbXV5IHBvZGVyb3NhLCBwdWVzIG9mcmVjZSBjb21vIHNhbGlkYSB1biBpbmZvcm1lIGNvbiBtdWNoYSBpbmZvcm1hY2nDs24uICAKCkxhIGZ1bmNpw7NuIHBhcmEgY3JlYXIgZWwgaW5mb3JtZSBlcyAqY3JlYXRlX3JlcG9ydGUqLCB5IHBhcmEgdmVyIGNhZGEgZ3LDoWZpY2EgZGUgZm9ybWEgaW5kaXZpZHVhbCwgbGFzIGZ1bmNpb25lcyBzb246ICAKCiogKmludHJvZHVjZSgpKgoqICpwbG90X2ludHJvKCkqCiogKnBsdG9fYm94cGxvdCgpKgoqICpwbG90X21pc3NpbmcoKSoKKiAqcGxvdF9oaXN0b2dyYW0oKSoKKiAqcGxvdF9iYXIoKSoKKiAqcGxvdF9jb3JyZWxhdGlvbigpKiAgCgojIDxzcGFuIHN0eWxlPSJjb2xvcjogeWVsbG93OyI+SW5zdGFsYXIgcGFycXVldGVzIHkgbGxhbWFyIGxpYnJlcsOtYXMgPC9zcGFuPgoKYGBge3J9CiMgaW5zdGFsbC5wYWNrYWdlcygiRGF0YUV4cGxvcmVyIikKbGlicmFyeShEYXRhRXhwbG9yZXIpCiNpbnN0YWxsLnBhY2thZ2VzKCJueWNmbGlnaHRzMTMiKQogbGlicmFyeShueWNmbGlnaHRzMTMpCmBgYAoKIyA8c3BhbiBzdHlsZT0iY29sb3I6IHllbGxvdzsiPiBDb250ZXh0bzwvc3Bhbj4KRWwgcGFxdWV0ZSBueWNmbGlnaHRzMTMgY29udGllbmUgaW5mb3JtYWNpw7NuIHNvYnJlIHRvZG9zIGxvcyBidWVsb3MgcXVlIHBhcnRpZXJvbiBkZXNkZSBOdWV2YSBZb3JrIChFV1IsIEpGSywgTEdBKSBhIGRlc3Rpbm9zIGVuIGxvcyBFc3RhZG9zIFVuaWRvcyBlbiAyMDEzLiBGdWVyb24gMzM2LDc3NiB2dWVsb3MgZW4gdG90YUwuICAKCkxhcyB0YWJsYXMgZGUgZXN0ZSBwYXF1ZXRlIHkgc3VzIHJlbGFjaW9uZXMgc29uIGxhcyBzaWd1aWVudGVzOiAgCgohW10oL1VzZXJzL2xpZ2h0ZWRpdC9Eb2N1bWVudHMvVEVDIFNFTUVTVFJFIDYuMS9NMi9SL2RhdGEgZXhwbG9yZXJlL3JlbGF0aW9uYWwtbnljZmxpZ2h0cy5wbmcpCgoKYGBge3J9CmZsaWdodHMgPC0gZmxpZ2h0cwp3ZWF0aGVyIDwtIHdlYXRoZXIKcGxhbmVzIDwtIHBsYW5lcwphaXJwb3J0cyA8LSBhaXJwb3J0cwphaXJsaW5lcyA8LSBhaXJsaW5lcwpkZiA8LSBtZXJnZShmbGlnaHRzLCBhaXJsaW5lcywgYnkgPSAiY2FycmllciIpCmRmIDwtIG1lcmdlKGRmLCBwbGFuZXMsIGJ5ID0gInRhaWxudW0iKQoKCmBgYAoKIyA8c3BhbiBzdHlsZT0iY29sb3I6IHllbGxvdzsiPiBDb250ZXh0bzwvc3Bhbj4KCmBgYHtyfQojIGNyZWF0ZV9yZXBvcnQoZGYpCmludHJvZHVjZShkZikKcGxvdF9pbnRybyhkZikKI3Bsb3RfYm94cGxvdChkZikKcGxvdF9taXNzaW5nKGRmKQpwbG90X2hpc3RvZ3JhbShkZikKcGxvdF9iYXIoZGYpCnBsb3RfY29ycmVsYXRpb24oZGYpCgpgYGAKCgoKCgoKCgoKCgoKCg==