Para isso, utilizaremos a Regressão Logística, pois permite analisar variáveis dependentes binomiais (0 ou 1). Essas variáveis categóricas podem ser reduzidas para duas categorias (positivo/negativo; presente/ausente; 0/1).
odds do evento
\(\frac{p(evento)}{1-p(evento)} = e^{b_0 + b_1.X_1+...+b_n.X_n}\)
Primeiramente, colocamos em distaque a probabilidade do evento \(p(evento) = \frac{1}{1 + e^-{(b_0 + b_1.X_1+...+b_n.X_n)}}\)
Estudando essa função, concluimos que ela varia de 0 a 1. Então, utilizando essa função podemos prever o fenomeno ocorre no presente dado de speed dating. Abaixo temos o código R para responde essa pergunta da atividade L5P3: Regressão romântica
dados_romanticos = read_csv(("https://raw.githubusercontent.com/nazareno/ciencia-de-dados-1/master/5-regressao/speed-dating/speed-dating2.csv"),
col_types = cols(
.default = col_double(),
field = col_character(),
from = col_character(),
career = col_character(),
attr3_s = col_logical(),
sinc3_s = col_logical(),
intel3_s = col_logical(),
fun3_s = col_logical(),
amb3_s = col_logical(),
dec = col_character()
))%>%
mutate(dec = as.factor(dec))
glimpse(dados_romanticos)
## Rows: 4,918
## Columns: 44
## $ iid <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3~
## $ gender <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ order <dbl> 4, 3, 10, 5, 7, 6, 1, 2, 8, 9, 10, 9, 6, 1, 3, 2, 7, 8, 4, 5,~
## $ pid <dbl> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 11, 12, 13, 14, 15, 1~
## $ int_corr <dbl> 0.14, 0.54, 0.16, 0.61, 0.21, 0.25, 0.34, 0.50, 0.28, -0.36, ~
## $ samerace <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1~
## $ age_o <dbl> 27, 22, 22, 23, 24, 25, 30, 27, 28, 24, 27, 22, 22, 23, 24, 2~
## $ age <dbl> 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 24, 24, 24, 24, 24, 2~
## $ field <chr> "Law", "Law", "Law", "Law", "Law", "Law", "Law", "Law", "Law"~
## $ race <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ from <chr> "Chicago", "Chicago", "Chicago", "Chicago", "Chicago", "Chica~
## $ career <chr> "lawyer", "lawyer", "lawyer", "lawyer", "lawyer", "lawyer", "~
## $ sports <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3~
## $ tvsports <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 8~
## $ exercise <dbl> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7~
## $ dining <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10,~
## $ museums <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 5~
## $ art <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5~
## $ hiking <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 8~
## $ gaming <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4~
## $ clubbing <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 5~
## $ reading <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 10, 10, 10, 10, 10, 10, 10, 10,~
## $ tv <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8~
## $ theater <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7~
## $ movies <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 8, 8, ~
## $ concerts <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 7, 7, 7, 7, 7, 7, 7, ~
## $ music <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 5~
## $ shopping <dbl> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 8~
## $ yoga <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7~
## $ attr <dbl> 6, 7, 5, 7, 5, 4, 7, 4, 7, 5, 5, 8, 5, 7, 6, 8, 7, 5, 7, 6, 7~
## $ sinc <dbl> 9, 8, 8, 6, 6, 9, 6, 9, 6, 6, 7, 5, 8, 9, 8, 7, 5, 8, 6, 7, 9~
## $ intel <dbl> 7, 7, 9, 8, 7, 7, 7, 7, 8, 6, 8, 6, 9, 7, 7, 8, 9, 7, 8, 8, 1~
## $ fun <dbl> 7, 8, 8, 7, 7, 4, 4, 6, 9, 8, 4, 6, 6, 6, 9, 3, 6, 5, 9, 7, 7~
## $ amb <dbl> 6, 5, 5, 6, 6, 6, 6, 5, 8, 10, 6, 9, 3, 5, 7, 6, 7, 9, 4, 9, ~
## $ shar <dbl> 5, 6, 7, 8, 6, 4, 7, 6, 8, 8, 3, 6, 4, 7, 8, 2, 9, 5, 5, 8, 9~
## $ like <dbl> 7, 7, 7, 7, 6, 6, 6, 6, 7, 6, 6, 7, 6, 7, 8, 6, 8, 5, 5, 8, 8~
## $ prob <dbl> 6, 5, NA, 6, 6, 5, 5, 7, 7, 6, 4, 3, 7, 8, 6, 5, 7, 6, 6, 7, ~
## $ match_es <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, N~
## $ attr3_s <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ sinc3_s <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ intel3_s <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ fun3_s <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ amb3_s <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ dec <fct> yes, yes, yes, yes, yes, no, yes, no, yes, yes, no, no, no, y~
Regressão Logística foi utilizada para analisar a associação entre variáveis dec que diz se houve match entre os dois participantes do encontro, age que diz a idade do participante 1, age_o que diz a idade do participante 2, sinc quão sincero o participante 1 achou do participante 2 e attr quão atraente o participante 1 achou do participante 2. O formato da função resultante odds: \(p(dec) = \frac{1}{1 + e^-{(0.013 + 0.993.age + 0.992.age_o + 1.03.sinc + 1.96.attr)}}\).
Portanto, ser atraente (attr) aumenta 1.96 de chance com ICs 95% [1.86;2.05], sincero (sinc) aumenta 1.03 de chance com ICs 95% [0.98;1.07], idade (age) do participante 1 aumenta 0.993 de chance com ICs 95% [0.97;1.01] e idade (age_o) do participante 2 aumenta 0.992 de chance com ICs 95% [0.97;1.01]. McFadden = 19% explica a variação da probabilidade.
model = glm(dec ~ age + age_o + sinc + attr,
data = dados_romanticos,
family = "binomial")
tidy(model, conf.int = TRUE, exponentiate = TRUE) %>% select(-statistic, -p.value)
## # A tibble: 5 x 5
## term estimate std.error conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.0138 0.407 0.00617 0.0305
## 2 age 0.993 0.0105 0.972 1.01
## 3 age_o 0.992 0.0103 0.972 1.01
## 4 sinc 1.03 0.0221 0.986 1.07
## 5 attr 1.96 0.0246 1.87 2.05
glance(model)
## # A tibble: 1 x 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 6362. 4661 -2545. 5100. 5132. 5090. 4657 4662
pR2(model)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -2545.1299317 -3180.7969512 1271.3340389 0.1998452 0.2386799
## r2CU
## 0.3205876
model %>%
augment(type.predict = "response")
## # A tibble: 4,662 x 12
## .rownames dec age age_o sinc attr .fitted .resid .std.resid .hat
## <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 yes 21 27 9 6 0.406 1.34 1.34 0.00141
## 2 2 yes 21 22 8 7 0.575 1.05 1.05 0.00124
## 3 3 yes 21 22 8 5 0.261 1.64 1.64 0.00130
## 4 4 yes 21 23 6 7 0.559 1.08 1.08 0.00129
## 5 5 yes 21 24 6 5 0.247 1.67 1.67 0.000959
## 6 6 no 21 25 9 4 0.154 -0.577 -0.578 0.00131
## 7 7 yes 21 30 6 7 0.544 1.10 1.10 0.00183
## 8 8 no 21 27 9 4 0.151 -0.573 -0.573 0.00132
## 9 9 yes 21 28 6 7 0.549 1.10 1.10 0.00141
## 10 10 yes 21 24 6 5 0.247 1.67 1.67 0.000959
## # ... with 4,652 more rows, and 2 more variables: .sigma <dbl>, .cooksd <dbl>