Podstawowe operacje w R - część 4.

Czyszczenie danych

Zadanie domowe

Korzystając z paczki danych “germancredit” dotyczącą oceny kredytowej (creditability) wybranych klientów pewnego banku:

Czy w zbiorze danych mamy obserwacje brakujące?

Proszę dokonać kategoryzacji zmiennej “age.in.years” (wiek w latach) wg oceny kredytowej “creditability”.

Podaj i zinterpretuj wskaźniki informacyjne. Oceń skośność zmiennych ilościowych.

Sprawdź, czy nie mamy obserwacji odstających dla zmiennej “age.in.years” (wiek w latach). Jeśli są - dokonaj imputacji wybraną przez siebie metodą.

data("germancredit")
attach(germancredit)

sum(is.na(germancredit))#obserwacje brakujące
## [1] 0
aggr(germancredit)

bin<- binning_by(germancredit, y="creditability", x="age.in.years") #Kategoryzacja zmiennej wg oceny kredytowej 
## Warning in binning_by(germancredit, y = "creditability", x = "age.in.years"): The factor y has been changed to a numeric vector consisting of 0 and 1.
## 'good' changed to 1 (positive) and 'bad' changed to 0 (negative).
plot(bin)

summary(bin) #dobrą zdolność kredytową populacji w wieku 25-75 lat
## ── Binning Table ──────────────────────── Several Metrics ── 
##       Bin CntRec CntPos CntNeg RatePos RateNeg    Odds      WoE      IV     JSD
## 1 [19,25]    190    110     80 0.15714 0.26667 1.37500 -0.52884 0.05792 0.00716
## 2 (25,75]    810    590    220 0.84286 0.73333 2.68182  0.13920 0.01525 0.00190
## 3   Total   1000    700    300 1.00000 1.00000 2.33333       NA 0.07317 0.00906
##       AUC
## 1 0.02095
## 2 0.42429
## 3 0.44524
## 
## ── General Metrics ───────────────────────────────────────── 
## • Gini index                       :  -0.10952
## • IV (Jeffrey)                     :  0.07317
## • JS (Jensen-Shannon) Divergence   :  0.00906
## • Kolmogorov-Smirnov Statistics    :  0.10952
## • HHI (Herfindahl-Hirschman Index) :  0.6922
## • HHI (normalized)                 :  0.3844
## • Cramer's V                       :  0.12794 
## 
## ── Significance Tests ──────────────────── Chisquare Test ── 
##     Bin A   Bin B statistics      p_value
## 1 [19,25] (25,75]    16.3681 0.0000521562
find_skewness(germancredit, value = TRUE ,thres = 0.1) #ocena skośości zmiennych ilościowych
##                                        duration.in.month 
##                                                    1.093 
##                                            credit.amount 
##                                                    1.947 
##      installment.rate.in.percentage.of.disposable.income 
##                                                   -0.531 
##                                  present.residence.since 
##                                                   -0.272 
##                                             age.in.years 
##                                                    1.019 
##                  number.of.existing.credits.at.this.bank 
##                                                    1.271 
## number.of.people.being.liable.to.provide.maintenance.for 
##                                                    1.907
#Skośność powyżej 1, za wysoka dla większości zmiennych ilościowych. Najwyższa skośność dla zmiennej credit.amount wynosi 1.947. Te liczby wynikają z faktu, iż kredyty brane są na rózne kwoty.
  
boxplot(germancredit$age.in.years)

wiek<-imputate_outlier(germancredit, age.in.years, method="capping")
summary(wiek)
## Impute outliers with capping
## 
## * Information of Imputation (before vs after)
##                     Original    Imputation 
## described_variables "value"     "value"    
## n                   "1000"      "1000"     
## na                  "0"         "0"        
## mean                "35.546"    "35.350"   
## sd                  "11.3755"   "10.8530"  
## se_mean             "0.359724"  "0.343202" 
## IQR                 "15"        "15"       
## skewness            "1.020739"  "0.821878" 
## kurtosis            " 0.595780" "-0.132573"
## p00                 "19"        "19"       
## p01                 "20"        "20"       
## p05                 "22"        "22"       
## p10                 "23"        "23"       
## p20                 "26"        "26"       
## p25                 "27"        "27"       
## p30                 "28"        "28"       
## p40                 "30"        "30"       
## p50                 "33"        "33"       
## p60                 "36"        "36"       
## p70                 "39"        "39"       
## p75                 "42"        "42"       
## p80                 "45"        "45"       
## p90                 "52"        "52"       
## p95                 "60"        "60"       
## p99                 "67.01"     "63.00"    
## p100                "75"        "64"
plot(wiek)

extract(bin) 
##    [1] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##   [10] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25]
##   [19] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##   [28] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##   [37] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75]
##   [46] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##   [55] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75]
##   [64] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75]
##   [73] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##   [82] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##   [91] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [100] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [109] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [118] (25,75] [19,25] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [127] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
##  [136] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [145] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [154] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [163] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75]
##  [172] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [181] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [190] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [199] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [208] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [217] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [226] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] [19,25]
##  [235] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [244] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [253] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [262] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [271] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [280] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [289] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [298] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [307] [19,25] (25,75] [19,25] [19,25] (25,75] [19,25] (25,75] [19,25] (25,75]
##  [316] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [325] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [334] [19,25] [19,25] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
##  [343] [19,25] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25]
##  [352] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
##  [361] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
##  [370] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [379] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] [19,25]
##  [388] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75]
##  [397] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [406] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [415] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [424] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [433] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75]
##  [442] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [451] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [460] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [469] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [478] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [487] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [496] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [505] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [514] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
##  [523] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [532] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [541] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [550] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [559] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [568] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] [19,25]
##  [577] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [586] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
##  [595] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [604] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [613] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [622] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [631] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
##  [640] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [649] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [658] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75]
##  [667] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [676] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [685] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75]
##  [694] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [703] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75]
##  [712] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [721] (25,75] [19,25] [19,25] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75]
##  [730] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [739] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] [19,25]
##  [748] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75]
##  [757] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75]
##  [766] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [775] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [784] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [793] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [802] (25,75] [19,25] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75]
##  [811] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [820] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [829] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] [19,25]
##  [838] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [847] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [856] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [865] [19,25] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [874] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [883] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
##  [892] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [901] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75]
##  [910] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [919] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
##  [928] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
##  [937] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [946] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
##  [955] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
##  [964] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [973] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
##  [982] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75]
##  [991] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [1000] (25,75]
## Levels: [19,25] < (25,75]

Po więcej informacji nt. pakietu ‘dlookr’ zapraszam na jego stronę domową z rozwiązanymi przykładami.