Problemi simule etmek icin mtcars veri kumesini kullanacagim. Sizin orneginizde glm icin family binomial olarak secilmis, benzer olmasi icin iris veri kumesinde response degiskenim vs sutunu olacak
library(tidyverse)
library(rsample)
library(broom)
library(magrittr)
mtcars %>% head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
benzer sekilde 20 ornek veri olusturmak icin rsample::bootstraps fonksiyonunu kullancagim
sample20 <-
bootstraps(mtcars, times=20) %>%
rowwise() %>%
mutate(data_sample=(list(analysis(splits)))) %>%
select(id, data_sample) %>%
as_tibble()
sample20
## # A tibble: 20 x 2
## id data_sample
## <chr> <list>
## 1 Bootstrap01 <df[,11] [32 Ć 11]>
## 2 Bootstrap02 <df[,11] [32 Ć 11]>
## 3 Bootstrap03 <df[,11] [32 Ć 11]>
## 4 Bootstrap04 <df[,11] [32 Ć 11]>
## 5 Bootstrap05 <df[,11] [32 Ć 11]>
## 6 Bootstrap06 <df[,11] [32 Ć 11]>
## 7 Bootstrap07 <df[,11] [32 Ć 11]>
## 8 Bootstrap08 <df[,11] [32 Ć 11]>
## 9 Bootstrap09 <df[,11] [32 Ć 11]>
## 10 Bootstrap10 <df[,11] [32 Ć 11]>
## 11 Bootstrap11 <df[,11] [32 Ć 11]>
## 12 Bootstrap12 <df[,11] [32 Ć 11]>
## 13 Bootstrap13 <df[,11] [32 Ć 11]>
## 14 Bootstrap14 <df[,11] [32 Ć 11]>
## 15 Bootstrap15 <df[,11] [32 Ć 11]>
## 16 Bootstrap16 <df[,11] [32 Ć 11]>
## 17 Bootstrap17 <df[,11] [32 Ć 11]>
## 18 Bootstrap18 <df[,11] [32 Ć 11]>
## 19 Bootstrap19 <df[,11] [32 Ć 11]>
## 20 Bootstrap20 <df[,11] [32 Ć 11]>
sample20$data_sample[[1]] %>% as_tibble()
## # A tibble: 32 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 2 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
## 3 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 4 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 5 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
## 6 15 8 301 335 3.54 3.57 14.6 0 1 5 8
## 7 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 8 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
## 9 10.4 8 460 215 3 5.42 17.8 0 0 3 4
## 10 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
## # ⦠with 22 more rows
Yukarida gordugunuz gibi her bir data_sample satiri birer data.frame
sample20 20 ayri mtcars veri setinin bootstrapindan olusan bir data frame.
Asagida general linear model fonksiyonumu tanimlayacagim fit_glm
fit_glm <- function(data_) {
glm(formula= vs ~ wt + disp, data=data_, family=binomial)
}
purrr::map fonsiyonu ile de fit_glm fonksiyonunu her bir bootstrap orneklemi icin uygulayip bir fit sutununa kaydecegim. fit sutunundaki her satir birer glm modeli
sample20 %<>%
mutate(fit = map(data_sample, fit_glm))
sample20
## # A tibble: 20 x 3
## id data_sample fit
## <chr> <list> <list>
## 1 Bootstrap01 <df[,11] [32 Ć 11]> <glm>
## 2 Bootstrap02 <df[,11] [32 Ć 11]> <glm>
## 3 Bootstrap03 <df[,11] [32 Ć 11]> <glm>
## 4 Bootstrap04 <df[,11] [32 Ć 11]> <glm>
## 5 Bootstrap05 <df[,11] [32 Ć 11]> <glm>
## 6 Bootstrap06 <df[,11] [32 Ć 11]> <glm>
## 7 Bootstrap07 <df[,11] [32 Ć 11]> <glm>
## 8 Bootstrap08 <df[,11] [32 Ć 11]> <glm>
## 9 Bootstrap09 <df[,11] [32 Ć 11]> <glm>
## 10 Bootstrap10 <df[,11] [32 Ć 11]> <glm>
## 11 Bootstrap11 <df[,11] [32 Ć 11]> <glm>
## 12 Bootstrap12 <df[,11] [32 Ć 11]> <glm>
## 13 Bootstrap13 <df[,11] [32 Ć 11]> <glm>
## 14 Bootstrap14 <df[,11] [32 Ć 11]> <glm>
## 15 Bootstrap15 <df[,11] [32 Ć 11]> <glm>
## 16 Bootstrap16 <df[,11] [32 Ć 11]> <glm>
## 17 Bootstrap17 <df[,11] [32 Ć 11]> <glm>
## 18 Bootstrap18 <df[,11] [32 Ć 11]> <glm>
## 19 Bootstrap19 <df[,11] [32 Ć 11]> <glm>
## 20 Bootstrap20 <df[,11] [32 Ć 11]> <glm>
sample20$fit[[1]]
##
## Call: glm(formula = vs ~ wt + disp, family = binomial, data = data_)
##
## Coefficients:
## (Intercept) wt disp
## -1.05869 2.92154 -0.04265
##
## Degrees of Freedom: 31 Total (i.e. Null); 29 Residual
## Null Deviance: 42.34
## Residual Deviance: 19.37 AIC: 25.37
Her bir modelin katsayilari (coefficient) duzensiz olarak model icinde bulunmakta yukaridaki ornekte oldugu gibi. broom::tidy fonksiyonu bu katsayilari birer list olarak almaya yariyor.
sample20 %<>%
mutate(coefs = map(fit, tidy))
sample20
## # A tibble: 20 x 4
## id data_sample fit coefs
## <chr> <list> <list> <list>
## 1 Bootstrap01 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 2 Bootstrap02 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 3 Bootstrap03 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 4 Bootstrap04 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 5 Bootstrap05 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 6 Bootstrap06 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 7 Bootstrap07 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 8 Bootstrap08 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 9 Bootstrap09 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 10 Bootstrap10 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 11 Bootstrap11 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 12 Bootstrap12 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 13 Bootstrap13 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 14 Bootstrap14 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 15 Bootstrap15 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 16 Bootstrap16 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 17 Bootstrap17 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 18 Bootstrap18 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 19 Bootstrap19 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
## 20 Bootstrap20 <df[,11] [32 Ć 11]> <glm> <tibble [3 Ć 5]>
sample20$coefs[[1]]
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.06 3.35 -0.316 0.752
## 2 wt 2.92 2.05 1.43 0.153
## 3 disp -0.0427 0.0193 -2.21 0.0270
unnest fonksiyonunu kullanarak listeyi de dataframeāe cevirebiliriz
sample20 %>% unnest(coefs)
## # A tibble: 60 x 6
## id term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Bootstrap01 (Intercept) -1.06 3.35 -0.316 0.752
## 2 Bootstrap01 wt 2.92 2.05 1.43 0.153
## 3 Bootstrap01 disp -0.0427 0.0193 -2.21 0.0270
## 4 Bootstrap02 (Intercept) 3.62 3.42 1.06 0.290
## 5 Bootstrap02 wt 0.769 1.63 0.471 0.638
## 6 Bootstrap02 disp -0.0293 0.0137 -2.15 0.0318
## 7 Bootstrap03 (Intercept) -697. 169081. -0.00412 0.997
## 8 Bootstrap03 wt 1056. 251581. 0.00420 0.997
## 9 Bootstrap03 disp -13.0 3098. -0.00421 0.997
## 10 Bootstrap04 (Intercept) 3.02 3.38 0.894 0.371
## # ⦠with 50 more rows
broom::augment fonksiyonu tidy fonksiyonuna benzer sekilde bir model nesnesi (fit sutunu) alip birkac bilgi ekliyor. Ornegin .fitted sutunu standart predict fonksiyonunun yaptigini yapiyor, diger sutunlar da istatistiksel diger parametreleri (se.error p.value..) degerleri iceriyor
sample20 <- sample20 %>%
mutate(augmented = map(fit, augment, type.predict='response'))
sample20$augmented[[1]]
## # A tibble: 32 x 11
## .rownames vs wt disp .fitted .se.fit .resid .hat .sigma
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Hornet S⦠0 3.44 360 0.00172 0.00489 -0.0587 0.0139 0.832
## 2 Chrysler⦠0 5.34 440 0.0146 0.0301 -0.172 0.0629 0.831
## 3 Valiant 1 3.46 225 0.366 0.163 1.42 0.114 0.782
## 4 Duster 3⦠0 3.57 360 0.00251 0.00658 -0.0709 0.0173 0.832
## 5 Merc 450⦠0 4.07 276. 0.283 0.211 -0.815 0.220 0.813
## 6 Maserati⦠0 3.57 301 0.0302 0.0460 -0.248 0.0721 0.830
## 7 Mazda RX4 0 2.62 160 0.443 0.193 -1.08 0.152 0.802
## 8 Ferrari ⦠0 2.77 145 0.700 0.128 -1.55 0.0777 0.774
## 9 Lincoln ⦠0 5.42 460 0.00790 0.0175 -0.126 0.0392 0.831
## 10 Porsche ⦠0 2.14 120. 0.516 0.264 -1.20 0.280 0.787
## # ⦠with 22 more rows, and 2 more variables: .cooksd <dbl>,
## # .std.resid <dbl>
vs degeri orijinal degeri verirken .fitted da modelin hesapladigi degeri veriyor. Ikisinin arasindaki farki gormek icin:
sample20$augmented[[1]] %>%
ggplot(aes(wt, disp)) +
geom_point(aes(color=vs))
sample20$augmented[[1]] %>%
ggplot(aes(wt, disp)) +
geom_point(aes(color=.fitted))
broom:glance fonksiyonuna da benzer sekilde bakmanizi oneririm, belki aradiginiz istatistiksel deger pR2 paketine bakmadan burada var olabilir
sample20$fit[[1]] %>% glance()
## # A tibble: 1 x 7
## null.deviance df.null logLik AIC BIC deviance df.residual
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1 42.3 31 -9.68 25.4 29.8 19.4 29
Sizin takildiginiz probleme yani pR2 and margin degerlerini ekleme problemine ben de takildim. Normalde mutate map ikilisini kullanarak yeni bir sutun eklemek kolay olmasi gerek diye dusundum, ancak dusundugum gibi olmadi.
sample20 %>%
mutate(pr2 = map(fit, pscl::pR2))
## Error in is.data.frame(data): object 'data_' not found
pR2 fonksiyonun icine tek bir model verince bile ayni hatayi aliyorum
sample20$fit[[1]]
##
## Call: glm(formula = vs ~ wt + disp, family = binomial, data = data_)
##
## Coefficients:
## (Intercept) wt disp
## -1.05869 2.92154 -0.04265
##
## Degrees of Freedom: 31 Total (i.e. Null); 29 Residual
## Null Deviance: 42.34
## Residual Deviance: 19.37 AIC: 25.37
pscl::pR2(sample20$fit[[1]])
## Error in is.data.frame(data): object 'data_' not found
Gorunuse gore pR2 fonksiyonu formulden ote data.frameāin kendisini de istiyor, o yuzden hata veriyor. data_ adinda bir dataframe ariyor, oysa ki data_ fit_glm fonksiyonu icin olan bir degisken, o da glm formulundeki data argumani degiskeni.
ornegin datayi explicit olarak belirtirsek sample20$data_sample[[1]] gibi, o zaman calisiyor
model <-
glm(formula= vs ~ wt + disp,
data=sample20$data_sample[[1]],
family=binomial)
model
##
## Call: glm(formula = vs ~ wt + disp, family = binomial, data = sample20$data_sample[[1]])
##
## Coefficients:
## (Intercept) wt disp
## -1.05869 2.92154 -0.04265
##
## Degrees of Freedom: 31 Total (i.e. Null); 29 Residual
## Null Deviance: 42.34
## Residual Deviance: 19.37 AIC: 25.37
pscl::pR2(model)
## llh llhNull G2 McFadden r2ML r2CU
## -9.6837510 -21.1700236 22.9725453 0.5425725 0.5122206 0.6981349
Benim anladigim kadariyla bu pR2 modelinin icindeki bir durumla ilgili. Ayni durum maalesef margin icin de gecerli
Belki paket yazarlarina github uzerinde bir issue acarsaniz, bu konuda yardimci olabilirler.
Onun disinda maalesef ben de for dongusu disinda daha kolay bir yontem goremedim