Podstawowe operacje w R - część 4.
Czyszczenie danych
Zadanie domowe
Korzystając z paczki danych “germancredit” dotyczącą oceny kredytowej (creditability) wybranych klientów pewnego banku:
Czy w zbiorze danych mamy obserwacje brakujące?
Proszę dokonać kategoryzacji zmiennej “age.in.years” (wiek w latach) wg oceny kredytowej “creditability”.
Podaj i zinterpretuj wskaźniki informacyjne. Oceń skośność zmiennych ilościowych.
Sprawdź, czy nie mamy obserwacji odstających dla zmiennej “age.in.years” (wiek w latach). Jeśli są - dokonaj imputacji wybraną przez siebie metodą.
data("germancredit")
attach(germancredit)
sum(is.na(germancredit))#obserwacje brakujące## [1] 0
aggr(germancredit)bin<- binning_by(germancredit, y="creditability", x="age.in.years") #Kategoryzacja zmiennej wg oceny kredytowej ## Warning in binning_by(germancredit, y = "creditability", x = "age.in.years"): The factor y has been changed to a numeric vector consisting of 0 and 1.
## 'good' changed to 1 (positive) and 'bad' changed to 0 (negative).
plot(bin)summary(bin) #dobrą zdolność kredytową populacji w wieku 25-75 lat## ── Binning Table ──────────────────────── Several Metrics ──
## Bin CntRec CntPos CntNeg RatePos RateNeg Odds WoE IV JSD
## 1 [19,25] 190 110 80 0.15714 0.26667 1.37500 -0.52884 0.05792 0.00716
## 2 (25,75] 810 590 220 0.84286 0.73333 2.68182 0.13920 0.01525 0.00190
## 3 Total 1000 700 300 1.00000 1.00000 2.33333 NA 0.07317 0.00906
## AUC
## 1 0.02095
## 2 0.42429
## 3 0.44524
##
## ── General Metrics ─────────────────────────────────────────
## • Gini index : -0.10952
## • IV (Jeffrey) : 0.07317
## • JS (Jensen-Shannon) Divergence : 0.00906
## • Kolmogorov-Smirnov Statistics : 0.10952
## • HHI (Herfindahl-Hirschman Index) : 0.6922
## • HHI (normalized) : 0.3844
## • Cramer's V : 0.12794
##
## ── Significance Tests ──────────────────── Chisquare Test ──
## Bin A Bin B statistics p_value
## 1 [19,25] (25,75] 16.3681 0.0000521562
find_skewness(germancredit, value = TRUE ,thres = 0.1) #ocena skośości zmiennych ilościowych## duration.in.month
## 1.093
## credit.amount
## 1.947
## installment.rate.in.percentage.of.disposable.income
## -0.531
## present.residence.since
## -0.272
## age.in.years
## 1.019
## number.of.existing.credits.at.this.bank
## 1.271
## number.of.people.being.liable.to.provide.maintenance.for
## 1.907
#Skośność powyżej 1, za wysoka dla większości zmiennych ilościowych. Najwyższa skośność dla zmiennej credit.amount wynosi 1.947. Te liczby wynikają z faktu, iż kredyty brane są na rózne kwoty.
boxplot(germancredit$age.in.years)wiek<-imputate_outlier(germancredit, age.in.years, method="capping")
summary(wiek)## Impute outliers with capping
##
## * Information of Imputation (before vs after)
## Original Imputation
## described_variables "value" "value"
## n "1000" "1000"
## na "0" "0"
## mean "35.546" "35.350"
## sd "11.3755" "10.8530"
## se_mean "0.359724" "0.343202"
## IQR "15" "15"
## skewness "1.020739" "0.821878"
## kurtosis " 0.595780" "-0.132573"
## p00 "19" "19"
## p01 "20" "20"
## p05 "22" "22"
## p10 "23" "23"
## p20 "26" "26"
## p25 "27" "27"
## p30 "28" "28"
## p40 "30" "30"
## p50 "33" "33"
## p60 "36" "36"
## p70 "39" "39"
## p75 "42" "42"
## p80 "45" "45"
## p90 "52" "52"
## p95 "60" "60"
## p99 "67.01" "63.00"
## p100 "75" "64"
plot(wiek)extract(bin) ## [1] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [10] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25]
## [19] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [28] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [37] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75]
## [46] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [55] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75]
## [64] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75]
## [73] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [82] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [91] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
## [100] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
## [109] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75]
## [118] (25,75] [19,25] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75]
## [127] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
## [136] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [145] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [154] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [163] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75]
## [172] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [181] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [190] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [199] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [208] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [217] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [226] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] [19,25]
## [235] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [244] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
## [253] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
## [262] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [271] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [280] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [289] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [298] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [307] [19,25] (25,75] [19,25] [19,25] (25,75] [19,25] (25,75] [19,25] (25,75]
## [316] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [325] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [334] [19,25] [19,25] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
## [343] [19,25] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25]
## [352] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
## [361] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
## [370] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
## [379] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] [19,25]
## [388] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75]
## [397] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [406] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [415] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75]
## [424] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [433] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75]
## [442] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [451] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [460] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [469] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25]
## [478] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
## [487] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [496] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [505] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [514] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
## [523] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [532] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
## [541] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [550] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
## [559] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75]
## [568] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] [19,25]
## [577] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [586] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [595] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [604] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [613] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
## [622] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [631] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
## [640] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [649] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [658] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75]
## [667] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [676] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [685] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75]
## [694] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [703] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75]
## [712] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [721] (25,75] [19,25] [19,25] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75]
## [730] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [739] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] [19,25]
## [748] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] [19,25] (25,75] (25,75]
## [757] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75]
## [766] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
## [775] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] (25,75]
## [784] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [793] (25,75] (25,75] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75]
## [802] (25,75] [19,25] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75]
## [811] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [820] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [829] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] [19,25] (25,75] [19,25]
## [838] [19,25] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [847] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [856] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [865] [19,25] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [874] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [883] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75]
## [892] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [901] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75]
## [910] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75]
## [919] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25]
## [928] (25,75] (25,75] (25,75] [19,25] [19,25] (25,75] (25,75] [19,25] (25,75]
## [937] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [946] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75]
## [955] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75]
## [964] (25,75] [19,25] (25,75] [19,25] (25,75] (25,75] (25,75] [19,25] (25,75]
## [973] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75]
## [982] (25,75] (25,75] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75]
## [991] (25,75] (25,75] [19,25] (25,75] (25,75] (25,75] (25,75] (25,75] [19,25]
## [1000] (25,75]
## Levels: [19,25] < (25,75]
Po więcej informacji nt. pakietu ‘dlookr’ zapraszam na jego stronę domową z rozwiązanymi przykładami.