library(tidyverse)
library(broom)
data = read_csv("https://raw.githubusercontent.com/nazareno/ciencia-de-dados-1/master/5-regressao/speed-dating/speed-dating2.csv", col_types =
cols(
.default = col_double(),
field = col_character(),
from = col_character(),
career = col_character(),
dec = col_character()
)) %>%
mutate(dec = as.factor(dec), gender = as.factor(gender))
Os dados que iremos utilizar para essa análise são dados que descrevem 5000 encontros relâmpagos ou speed dating. Esses encontros ocorrem em uma duração de 4 minutos e envolveram 310 jovens dos Estados Unidos. Logo após esses encontros, os jovens preenchiam uma ficha avaliando as pessoas com quem eles se encontraram.
As fichas preenchidas continham diversas informações, entre elas estão:
- attr: Indica o quão atraente a pessoa_1 achou da pessoa_2
- sinc: Indica o quão sincera a pessoa_1 achou da pessoa_2
- intel: Indica o quão inteligente a pessoa_1 achou da pessoa_2
- fun: Indica o quão divertida a pessoa_1 achou da pessoa_2
- amb: Indica o quão ambiciosa a pessoa_1 achou da pessoa_2
- shar: Indica o quanto a pessoa_1 compartilha de interesses semelhantes com a pessoa_2
A informação que queremos entender desses dados é se elas são um conjunto de variáveis explicativas para predizer o valor de ‘dec’. E o que seria o dec ? É simplesmente se houve um match entre as pessoas nesse encontro.
Para começar, vamos visualizar a distribuição dos valores de cada uma dessas variáveis em relação ao ‘dec’.
Attr:
data %>%
ggplot(aes(x = dec, y = attr)) +
geom_violin(aes(fill = dec)) +
geom_count() +
coord_flip()
Warning: Removed 118 rows containing non-finite values (stat_ydensity).
Warning: Removed 118 rows containing non-finite values (stat_sum).

Sinc:
data %>%
ggplot(aes(x = dec, y = sinc)) +
geom_violin(aes(fill = dec)) +
geom_count() +
coord_flip()
Warning: Removed 161 rows containing non-finite values (stat_ydensity).
Warning: Removed 161 rows containing non-finite values (stat_sum).

Intel:
data %>%
ggplot(aes(x = dec, y = intel)) +
geom_violin(aes(fill = dec)) +
geom_count() +
coord_flip()
Warning: Removed 166 rows containing non-finite values (stat_ydensity).
Warning: Removed 166 rows containing non-finite values (stat_sum).

Fun:
data %>%
ggplot(aes(x = dec, y = fun)) +
geom_violin(aes(fill = dec)) +
geom_count() +
coord_flip()
Warning: Removed 197 rows containing non-finite values (stat_ydensity).
Warning: Removed 197 rows containing non-finite values (stat_sum).

Amb:
data %>%
ggplot(aes(x = dec, y = amb)) +
geom_violin(aes(fill = dec)) +
geom_count() +
coord_flip()
Warning: Removed 421 rows containing non-finite values (stat_ydensity).
Warning: Removed 421 rows containing non-finite values (stat_sum).

Shar:
data %>%
ggplot(aes(x = dec, y = shar)) +
geom_violin(aes(fill = dec)) +
geom_count() +
coord_flip()
Warning: Removed 643 rows containing non-finite values (stat_ydensity).
Warning: Removed 643 rows containing non-finite values (stat_sum).

É interessante ver que para quase todo, as variáveis influenciaram de alguma maneira, nem que seja pouca, para que a pessoa_1 marcasse um match com a pessoa_2. Um variável que das observadas acima que não demonstraram uma diferença visual foi a de Ambição, algo bem interessante!
Podemos portanto, iniciar nosso planejamento em criar um modelo de regressão logística que realize um treinamento sobre os dados presentes e nos informe os valores de coeficientes encontrados!
model_matches = glm(dec ~ attr + sinc + intel + fun + amb + shar,
data = data,
family = "binomial")
tidy(model_matches, conf.int = TRUE, exponentiate = TRUE)
summary(model_matches, conf.int = TRUE, exponentiate = TRUE)
Call:
glm(formula = dec ~ attr + sinc + intel + fun + amb + shar, family = "binomial",
data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5362 -0.8250 -0.3011 0.8392 3.4541
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.104342 0.243021 -21.004 < 2e-16 ***
attr 0.562454 0.028511 19.727 < 2e-16 ***
sinc -0.129343 0.031508 -4.105 4.04e-05 ***
intel 0.006912 0.038536 0.179 0.858
fun 0.258122 0.030706 8.406 < 2e-16 ***
amb -0.193363 0.030457 -6.349 2.17e-10 ***
shar 0.323160 0.024189 13.360 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5625.5 on 4127 degrees of freedom
Residual deviance: 4139.8 on 4121 degrees of freedom
(790 observations deleted due to missingness)
AIC: 4153.8
Number of Fisher Scoring iterations: 5
Dado o resultado calculado pelo modelo acima, podemos ver que o impacto positivo estão mais associados as variáveis de ‘fun’, ‘attr’ e ‘shar’, ou seja, pessoas divertidas, atraentes e que compartilham os mesmos interesses respectivamente.
Para os valores estimados pelo nosso modelo de regressão logística, temos os seguintes resultados:
attr: Sempre que o attr aumentar uma unidade, as chances do match ocorrer é de 75%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [1.63, 1.82], ou seja, chances de 63% a 82%.
sinc: Sempre que o sinc aumentar uma unidade, as chances do match ocorrer é de -12,93%, (os valores podem ser melhor vistos na sumarização disposta na célula acima). Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [0.825, 0.934], ou seja, chances de -0.1740% a -0.0654%.
intel: Sempre que o intel aumentar uma unidade, as chances do match ocorrer é de 0.69%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [0.933, 1.085], ou seja, chances de -0.66% a 0.85%.
fun: Sempre que o fun aumentar uma unidade, as chances do match ocorrer é de 29,44%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [1.21, 1.37], ou seja, chances de 21% a 37%.
amb: Sempre que o amb aumentar uma unidade, as chances do match ocorrer é de -17,59%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [0.776, 0.874], ou seja, chances de -22,4% a -12,6%.
shar: Sempre que o shar aumentar uma unidade, as chances do match ocorrer é de 38,14%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [1.31, 1.4490], ou seja, chances de 31% a 44%.
LS0tDQp0aXRsZTogIlJlZ3Jlc3PDo28gTG9nw61zdGljYTogVXRpbGl6YW5kbyBkYWRvcyByb23Dom50aWNvcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCmBgYHtyfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KGJyb29tKQ0KDQpkYXRhID0gcmVhZF9jc3YoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9uYXphcmVuby9jaWVuY2lhLWRlLWRhZG9zLTEvbWFzdGVyLzUtcmVncmVzc2FvL3NwZWVkLWRhdGluZy9zcGVlZC1kYXRpbmcyLmNzdiIsIGNvbF90eXBlcyA9IA0KICAgICAgICAgICAgICAgY29scygNCiAgICAgICAgICAgICAgICAgICAgICAuZGVmYXVsdCA9IGNvbF9kb3VibGUoKSwNCiAgICAgICAgICAgICAgICAgICAgICBmaWVsZCA9IGNvbF9jaGFyYWN0ZXIoKSwNCiAgICAgICAgICAgICAgICAgICAgICBmcm9tID0gY29sX2NoYXJhY3RlcigpLA0KICAgICAgICAgICAgICAgICAgICAgIGNhcmVlciA9IGNvbF9jaGFyYWN0ZXIoKSwNCiAgICAgICAgICAgICAgICAgICAgICBkZWMgPSBjb2xfY2hhcmFjdGVyKCkNCiAgICAgICAgICAgICAgICAgICAgKSkgJT4lIA0KICAgICAgICBtdXRhdGUoZGVjID0gYXMuZmFjdG9yKGRlYyksIGdlbmRlciA9IGFzLmZhY3RvcihnZW5kZXIpKQ0KYGBgDQoNCk9zIGRhZG9zIHF1ZSBpcmVtb3MgdXRpbGl6YXIgcGFyYSBlc3NhIGFuw6FsaXNlIHPDo28gZGFkb3MgcXVlIGRlc2NyZXZlbSA1MDAwIGVuY29udHJvcyByZWzDom1wYWdvcyBvdSBzcGVlZCBkYXRpbmcuIA0KRXNzZXMgZW5jb250cm9zIG9jb3JyZW0gZW0gdW1hIGR1cmHDp8OjbyBkZSA0IG1pbnV0b3MgZSBlbnZvbHZlcmFtIDMxMCBqb3ZlbnMgZG9zIEVzdGFkb3MgVW5pZG9zLiBMb2dvIGFww7NzIGVzc2VzIGVuY29udHJvcywgb3Mgam92ZW5zIHByZWVuY2hpYW0gdW1hIGZpY2hhIGF2YWxpYW5kbyBhcyBwZXNzb2FzIGNvbSBxdWVtIGVsZXMgc2UgZW5jb250cmFyYW0uDQoNCg0KQXMgZmljaGFzIHByZWVuY2hpZGFzIGNvbnRpbmhhbSBkaXZlcnNhcyBpbmZvcm1hw6fDtWVzLCBlbnRyZSBlbGFzIGVzdMOjbzoNCg0KLSAqYXR0cio6IEluZGljYSBvIHF1w6NvIGF0cmFlbnRlIGEgcGVzc29hXzEgYWNob3UgZGEgcGVzc29hXzIgDQotICpzaW5jKjogSW5kaWNhIG8gcXXDo28gc2luY2VyYSBhIHBlc3NvYV8xIGFjaG91IGRhIHBlc3NvYV8yIA0KLSAqaW50ZWwqOiBJbmRpY2EgbyBxdcOjbyBpbnRlbGlnZW50ZSBhIHBlc3NvYV8xIGFjaG91IGRhIHBlc3NvYV8yIA0KLSAqZnVuKjogSW5kaWNhIG8gcXXDo28gZGl2ZXJ0aWRhIGEgcGVzc29hXzEgYWNob3UgZGEgcGVzc29hXzIgDQotICphbWIqOiBJbmRpY2EgbyBxdcOjbyBhbWJpY2lvc2EgYSBwZXNzb2FfMSBhY2hvdSBkYSBwZXNzb2FfMg0KLSAqc2hhcio6IEluZGljYSBvIHF1YW50byBhIHBlc3NvYV8xIGNvbXBhcnRpbGhhIGRlIGludGVyZXNzZXMgc2VtZWxoYW50ZXMgY29tIGEgcGVzc29hXzINCg0KQSBpbmZvcm1hw6fDo28gcXVlIHF1ZXJlbW9zIGVudGVuZGVyIGRlc3NlcyBkYWRvcyDDqSBzZSBlbGFzIHPDo28gdW0gY29uanVudG8gZGUgdmFyacOhdmVpcyBleHBsaWNhdGl2YXMgcGFyYSBwcmVkaXplciBvIHZhbG9yIGRlICdkZWMnLiBFIG8gcXVlIHNlcmlhIG8gZGVjID8gw4kgc2ltcGxlc21lbnRlIHNlIGhvdXZlIHVtIG1hdGNoIGVudHJlIGFzIHBlc3NvYXMgbmVzc2UgZW5jb250cm8uDQoNClBhcmEgY29tZcOnYXIsIHZhbW9zIHZpc3VhbGl6YXIgYSBkaXN0cmlidWnDp8OjbyBkb3MgdmFsb3JlcyBkZSBjYWRhIHVtYSBkZXNzYXMgdmFyacOhdmVpcyBlbSByZWxhw6fDo28gYW8gJ2RlYycuDQoNCiMjIyMgQXR0cjoNCmBgYHtyfQ0KZGF0YSAlPiUNCiAgICBnZ3Bsb3QoYWVzKHggPSBkZWMsIHkgPSBhdHRyKSkgKw0KICAgIGdlb21fdmlvbGluKGFlcyhmaWxsID0gZGVjKSkgKyANCiAgICBnZW9tX2NvdW50KCkgKw0KICAgIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjIyMgU2luYzoNCmBgYHtyfQ0KZGF0YSAlPiUNCiAgICBnZ3Bsb3QoYWVzKHggPSBkZWMsIHkgPSBzaW5jKSkgKw0KICAgIGdlb21fdmlvbGluKGFlcyhmaWxsID0gZGVjKSkgKyANCiAgICBnZW9tX2NvdW50KCkgKw0KICAgIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjIyMgSW50ZWw6DQpgYGB7cn0NCmRhdGEgJT4lDQogICAgZ2dwbG90KGFlcyh4ID0gZGVjLCB5ID0gaW50ZWwpKSArDQogICAgZ2VvbV92aW9saW4oYWVzKGZpbGwgPSBkZWMpKSArIA0KICAgIGdlb21fY291bnQoKSArDQogICAgY29vcmRfZmxpcCgpDQpgYGANCg0KIyMjIyBGdW46DQpgYGB7cn0NCmRhdGEgJT4lDQogICAgZ2dwbG90KGFlcyh4ID0gZGVjLCB5ID0gZnVuKSkgKw0KICAgIGdlb21fdmlvbGluKGFlcyhmaWxsID0gZGVjKSkgKyANCiAgICBnZW9tX2NvdW50KCkgKw0KICAgIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjIyMgQW1iOg0KYGBge3J9DQpkYXRhICU+JQ0KICAgIGdncGxvdChhZXMoeCA9IGRlYywgeSA9IGFtYikpICsNCiAgICBnZW9tX3Zpb2xpbihhZXMoZmlsbCA9IGRlYykpICsgDQogICAgZ2VvbV9jb3VudCgpICsNCiAgICBjb29yZF9mbGlwKCkNCmBgYA0KDQojIyMjIFNoYXI6DQpgYGB7cn0NCmRhdGEgJT4lDQogICAgZ2dwbG90KGFlcyh4ID0gZGVjLCB5ID0gc2hhcikpICsNCiAgICBnZW9tX3Zpb2xpbihhZXMoZmlsbCA9IGRlYykpICsgDQogICAgZ2VvbV9jb3VudCgpICsNCiAgICBjb29yZF9mbGlwKCkNCmBgYA0KDQrDiSBpbnRlcmVzc2FudGUgdmVyIHF1ZSBwYXJhIHF1YXNlIHRvZG8sIGFzIHZhcmnDoXZlaXMgaW5mbHVlbmNpYXJhbSBkZSBhbGd1bWEgbWFuZWlyYSwgbmVtIHF1ZSBzZWphIHBvdWNhLCBwYXJhIHF1ZSBhIHBlc3NvYV8xIG1hcmNhc3NlIHVtIG1hdGNoIGNvbSBhIHBlc3NvYV8yLiBVbSB2YXJpw6F2ZWwgcXVlIGRhcyBvYnNlcnZhZGFzIGFjaW1hIHF1ZSBuw6NvIGRlbW9uc3RyYXJhbSB1bWEgZGlmZXJlbsOnYSB2aXN1YWwgZm9pIGEgZGUgQW1iacOnw6NvLCBhbGdvIGJlbSBpbnRlcmVzc2FudGUhDQoNCg0KUG9kZW1vcyBwb3J0YW50bywgaW5pY2lhciBub3NzbyBwbGFuZWphbWVudG8gZW0gY3JpYXIgdW0gbW9kZWxvIGRlIHJlZ3Jlc3PDo28gbG9nw61zdGljYSBxdWUgcmVhbGl6ZSB1bSB0cmVpbmFtZW50byBzb2JyZSBvcyBkYWRvcyBwcmVzZW50ZXMgZSBub3MgaW5mb3JtZSBvcyB2YWxvcmVzIGRlIGNvZWZpY2llbnRlcyBlbmNvbnRyYWRvcyENCg0KYGBge3J9DQptb2RlbF9tYXRjaGVzID0gZ2xtKGRlYyB+IGF0dHIgKyBzaW5jICsgaW50ZWwgKyBmdW4gKyBhbWIgKyBzaGFyLCANCiAgICAgICAgICAgICAgICAgIGRhdGEgPSBkYXRhLCANCiAgICAgICAgICAgICAgICAgIGZhbWlseSA9ICJiaW5vbWlhbCIpDQoNCnRpZHkobW9kZWxfbWF0Y2hlcywgY29uZi5pbnQgPSBUUlVFLCBleHBvbmVudGlhdGUgPSBUUlVFKQ0KYGBgDQoNCmBgYHtyfQ0Kc3VtbWFyeShtb2RlbF9tYXRjaGVzLCBjb25mLmludCA9IFRSVUUsIGV4cG9uZW50aWF0ZSA9IFRSVUUpDQpgYGANCg0KDQpEYWRvIG8gcmVzdWx0YWRvIGNhbGN1bGFkbyBwZWxvIG1vZGVsbyBhY2ltYSwgcG9kZW1vcyB2ZXIgcXVlIG8gaW1wYWN0byBwb3NpdGl2byBlc3TDo28gbWFpcyBhc3NvY2lhZG9zIGFzIHZhcmnDoXZlaXMgZGUgJ2Z1bicsICdhdHRyJyBlICdzaGFyJywgb3Ugc2VqYSwgcGVzc29hcyBkaXZlcnRpZGFzLCBhdHJhZW50ZXMgZSBxdWUgY29tcGFydGlsaGFtIG9zIG1lc21vcyBpbnRlcmVzc2VzIHJlc3BlY3RpdmFtZW50ZS4gDQoNClBhcmEgb3MgdmFsb3JlcyBlc3RpbWFkb3MgcGVsbyBub3NzbyBtb2RlbG8gZGUgcmVncmVzc8OjbyBsb2fDrXN0aWNhLCB0ZW1vcyBvcyBzZWd1aW50ZXMgcmVzdWx0YWRvczoNCg0KYXR0cjogU2VtcHJlIHF1ZSBvIGF0dHIgYXVtZW50YXIgdW1hIHVuaWRhZGUsIGFzIGNoYW5jZXMgZG8gbWF0Y2ggb2NvcnJlciDDqSBkZSA3NSUuIFNlZ3VuZG8gYSBub3NzYSBhbW9zdHJhLCBvICBJQyBlbmNvbnRyYWRvIGNvbSB1bWEgY29uZmlhbsOnYSBkZSA5NSUgcG9kZSB2YXJpYXIgZW50cmUgWzEuNjMsIDEuODJdLCBvdSBzZWphLCBjaGFuY2VzIGRlIDYzJSBhIDgyJS4NCg0Kc2luYzogU2VtcHJlIHF1ZSBvIHNpbmMgYXVtZW50YXIgdW1hIHVuaWRhZGUsIGFzIGNoYW5jZXMgZG8gbWF0Y2ggb2NvcnJlciDDqSBkZSAtMTIsOTMlLCAob3MgdmFsb3JlcyBwb2RlbSBzZXIgbWVsaG9yIHZpc3RvcyBuYSBzdW1hcml6YcOnw6NvIGRpc3Bvc3RhIG5hIGPDqWx1bGEgYWNpbWEpLiBTZWd1bmRvIGEgbm9zc2EgYW1vc3RyYSwgbyBJQyBlbmNvbnRyYWRvIGNvbSB1bWEgY29uZmlhbsOnYSBkZSA5NSUgcG9kZSB2YXJpYXIgZW50cmUgWzAuODI1LCAwLjkzNF0sIG91IHNlamEsIGNoYW5jZXMgZGUgLTAuMTc0MCUgYSAtMC4wNjU0JS4NCg0KaW50ZWw6IFNlbXByZSBxdWUgbyBpbnRlbCBhdW1lbnRhciB1bWEgdW5pZGFkZSwgYXMgY2hhbmNlcyBkbyBtYXRjaCBvY29ycmVyIMOpIGRlIDAuNjklLiBTZWd1bmRvIGEgbm9zc2EgYW1vc3RyYSwgbyAgSUMgZW5jb250cmFkbyBjb20gdW1hIGNvbmZpYW7Dp2EgZGUgOTUlIHBvZGUgdmFyaWFyIGVudHJlIFswLjkzMywgMS4wODVdLCBvdSBzZWphLCBjaGFuY2VzIGRlIC0wLjY2JSBhIDAuODUlLg0KDQpmdW46IFNlbXByZSBxdWUgbyBmdW4gYXVtZW50YXIgdW1hIHVuaWRhZGUsIGFzIGNoYW5jZXMgZG8gbWF0Y2ggb2NvcnJlciDDqSBkZSAyOSw0NCUuIFNlZ3VuZG8gYSBub3NzYSBhbW9zdHJhLCBvICBJQyBlbmNvbnRyYWRvIGNvbSB1bWEgY29uZmlhbsOnYSBkZSA5NSUgcG9kZSB2YXJpYXIgZW50cmUgWzEuMjEsIDEuMzddLCBvdSBzZWphLCBjaGFuY2VzIGRlIDIxJSBhIDM3JS4NCg0KYW1iOiBTZW1wcmUgcXVlIG8gYW1iIGF1bWVudGFyIHVtYSB1bmlkYWRlLCBhcyBjaGFuY2VzIGRvIG1hdGNoIG9jb3JyZXIgw6kgZGUgLTE3LDU5JS4gU2VndW5kbyBhIG5vc3NhIGFtb3N0cmEsIG8gIElDIGVuY29udHJhZG8gY29tIHVtYSBjb25maWFuw6dhIGRlIDk1JSBwb2RlIHZhcmlhciBlbnRyZSBbMC43NzYsIDAuODc0XSwgb3Ugc2VqYSwgY2hhbmNlcyBkZSAtMjIsNCUgYSAtMTIsNiUuDQoNCnNoYXI6IFNlbXByZSBxdWUgbyBzaGFyIGF1bWVudGFyIHVtYSB1bmlkYWRlLCBhcyBjaGFuY2VzIGRvIG1hdGNoIG9jb3JyZXIgw6kgZGUgMzgsMTQlLiBTZWd1bmRvIGEgbm9zc2EgYW1vc3RyYSwgbyAgSUMgZW5jb250cmFkbyBjb20gdW1hIGNvbmZpYW7Dp2EgZGUgOTUlIHBvZGUgdmFyaWFyIGVudHJlIFsxLjMxLCAxLjQ0OTBdLCBvdSBzZWphLCBjaGFuY2VzIGRlIDMxJSBhIDQ0JS4=