InspectDF

InspectDF

inspectdf: Inspection, Comparison and Visualisation of Data Frames

inspectdf is collection of utilities for columnwise summary, comparison and visualisation of data frames.

Functions are provided to summarise missingness, categorical levels, numeric distribution, correlation, column types and memory usage.

The package has three aims: to speed up repetitive checking and exploratory tasks for data frames



library(dplyr)
library(inspectdf)

InspectDF

Key functions

  • inspect_types() - summary of column types
  • inspect_mem() - summary of memory usage of columns
  • inspect_na() - columnwise prevalence of missing values
  • inspect_cor() - correlation coefficients of numeric columns
  • inspect_imb() - feature imbalance of categorical columns
  • inspect_num() - summaries of numeric columns
  • inspect_cat() - summaries of categorical columns

InspectDF

# Load dplyr for starwars data & pipe

library(dplyr)

# Single dataframe summary

inspect_cat(starwars)
## # A tibble: 8 x 5
##   col_name     cnt common    common_pcnt levels           
##   <chr>      <int> <chr>           <dbl> <named list>     
## 1 eye_color     15 brown           24.1  <tibble [15 x 3]>
## 2 gender         3 masculine       75.9  <tibble [3 x 3]> 
## 3 hair_color    13 none            42.5  <tibble [13 x 3]>
## 4 homeworld     49 Naboo           12.6  <tibble [49 x 3]>
## 5 name          87 Ackbar           1.15 <tibble [87 x 3]>
## 6 sex            5 male            69.0  <tibble [5 x 3]> 
## 7 skin_color    31 fair            19.5  <tibble [31 x 3]>
## 8 species       38 Human           40.2  <tibble [38 x 3]>

InspectDF

library(dplyr)
# Paired dataframe comparison
inspect_cat(starwars, starwars[1:20, ])
## # A tibble: 8 x 5
##   col_name      jsd fisher_p lvls_1            lvls_2           
##   <chr>       <dbl>    <dbl> <named list>      <named list>     
## 1 eye_color  0.0936 0.750    <tibble [15 x 3]> <tibble [8 x 3]> 
## 2 gender     0.0387 0.353    <tibble [3 x 3]>  <tibble [2 x 3]> 
## 3 hair_color 0.261  0.000843 <tibble [13 x 3]> <tibble [10 x 3]>
## 4 homeworld  0.394  0.359    <tibble [49 x 3]> <tibble [11 x 3]>
## 5 name       0.573  1.00     <tibble [87 x 3]> <tibble [20 x 3]>
## 6 sex        0.0526 0.287    <tibble [5 x 3]>  <tibble [4 x 3]> 
## 7 skin_color 0.288  0.990    <tibble [31 x 3]> <tibble [10 x 3]>
## 8 species    0.300  0.807    <tibble [38 x 3]> <tibble [6 x 3]>

InspectDF

# Grouped dataframe summary
starwars %>% group_by(species) %>% inspect_cat()
## # A tibble: 266 x 6
## # Groups:   species [38]
##    species  col_name     cnt common        common_pcnt levels          
##    <chr>    <chr>      <int> <chr>               <dbl> <named list>    
##  1 Aleena   eye_color      1 unknown               100 <tibble [1 x 3]>
##  2 Aleena   gender         1 masculine             100 <tibble [1 x 3]>
##  3 Aleena   hair_color     1 none                  100 <tibble [1 x 3]>
##  4 Aleena   homeworld      1 Aleen Minor           100 <tibble [1 x 3]>
##  5 Aleena   name           1 Ratts Tyerell         100 <tibble [1 x 3]>
##  6 Aleena   sex            1 male                  100 <tibble [1 x 3]>
##  7 Aleena   skin_color     1 grey, blue            100 <tibble [1 x 3]>
##  8 Besalisk eye_color      1 yellow                100 <tibble [1 x 3]>
##  9 Besalisk gender         1 masculine             100 <tibble [1 x 3]>
## 10 Besalisk hair_color     1 none                  100 <tibble [1 x 3]>
## # ... with 256 more rows