InspectDF
InspectDF
inspectdf: Inspection, Comparison and Visualisation of Data Frames
inspectdf is collection of utilities for columnwise summary, comparison and visualisation of data frames.
Functions are provided to summarise missingness, categorical levels, numeric distribution, correlation, column types and memory usage.
The package has three aims: to speed up repetitive checking and exploratory tasks for data frames
library(dplyr)
library(inspectdf)
InspectDF
Key functions
inspect_types()
- summary of column typesinspect_mem()
- summary of memory usage of columnsinspect_na()
- columnwise prevalence of missing valuesinspect_cor()
- correlation coefficients of numeric columnsinspect_imb()
- feature imbalance of categorical columnsinspect_num()
- summaries of numeric columnsinspect_cat()
- summaries of categorical columns
InspectDF
# Load dplyr for starwars data & pipe
library(dplyr)
# Single dataframe summary
inspect_cat(starwars)
## # A tibble: 8 x 5
## col_name cnt common common_pcnt levels
## <chr> <int> <chr> <dbl> <named list>
## 1 eye_color 15 brown 24.1 <tibble [15 x 3]>
## 2 gender 3 masculine 75.9 <tibble [3 x 3]>
## 3 hair_color 13 none 42.5 <tibble [13 x 3]>
## 4 homeworld 49 Naboo 12.6 <tibble [49 x 3]>
## 5 name 87 Ackbar 1.15 <tibble [87 x 3]>
## 6 sex 5 male 69.0 <tibble [5 x 3]>
## 7 skin_color 31 fair 19.5 <tibble [31 x 3]>
## 8 species 38 Human 40.2 <tibble [38 x 3]>
InspectDF
library(dplyr)
# Paired dataframe comparison
inspect_cat(starwars, starwars[1:20, ])
## # A tibble: 8 x 5
## col_name jsd fisher_p lvls_1 lvls_2
## <chr> <dbl> <dbl> <named list> <named list>
## 1 eye_color 0.0936 0.750 <tibble [15 x 3]> <tibble [8 x 3]>
## 2 gender 0.0387 0.353 <tibble [3 x 3]> <tibble [2 x 3]>
## 3 hair_color 0.261 0.000843 <tibble [13 x 3]> <tibble [10 x 3]>
## 4 homeworld 0.394 0.359 <tibble [49 x 3]> <tibble [11 x 3]>
## 5 name 0.573 1.00 <tibble [87 x 3]> <tibble [20 x 3]>
## 6 sex 0.0526 0.287 <tibble [5 x 3]> <tibble [4 x 3]>
## 7 skin_color 0.288 0.990 <tibble [31 x 3]> <tibble [10 x 3]>
## 8 species 0.300 0.807 <tibble [38 x 3]> <tibble [6 x 3]>
InspectDF
# Grouped dataframe summary
starwars %>% group_by(species) %>% inspect_cat()
## # A tibble: 266 x 6
## # Groups: species [38]
## species col_name cnt common common_pcnt levels
## <chr> <chr> <int> <chr> <dbl> <named list>
## 1 Aleena eye_color 1 unknown 100 <tibble [1 x 3]>
## 2 Aleena gender 1 masculine 100 <tibble [1 x 3]>
## 3 Aleena hair_color 1 none 100 <tibble [1 x 3]>
## 4 Aleena homeworld 1 Aleen Minor 100 <tibble [1 x 3]>
## 5 Aleena name 1 Ratts Tyerell 100 <tibble [1 x 3]>
## 6 Aleena sex 1 male 100 <tibble [1 x 3]>
## 7 Aleena skin_color 1 grey, blue 100 <tibble [1 x 3]>
## 8 Besalisk eye_color 1 yellow 100 <tibble [1 x 3]>
## 9 Besalisk gender 1 masculine 100 <tibble [1 x 3]>
## 10 Besalisk hair_color 1 none 100 <tibble [1 x 3]>
## # ... with 256 more rows