library(tidyverse)
library(broom)

data = read_csv("https://raw.githubusercontent.com/nazareno/ciencia-de-dados-1/master/5-regressao/speed-dating/speed-dating2.csv", col_types = 
               cols(
                      .default = col_double(),
                      field = col_character(),
                      from = col_character(),
                      career = col_character(),
                      dec = col_character()
                    )) %>% 
        mutate(dec = as.factor(dec), gender = as.factor(gender))

Os dados que iremos utilizar para essa análise são dados que descrevem 5000 encontros relâmpagos ou speed dating. Esses encontros ocorrem em uma duração de 4 minutos e envolveram 310 jovens dos Estados Unidos. Logo após esses encontros, os jovens preenchiam uma ficha avaliando as pessoas com quem eles se encontraram.

As fichas preenchidas continham diversas informações, entre elas estão:

A informação que queremos entender desses dados é se elas são um conjunto de variáveis explicativas para predizer o valor de ‘dec’. E o que seria o dec ? É simplesmente se houve um match entre as pessoas nesse encontro.

Para começar, vamos visualizar a distribuição dos valores de cada uma dessas variáveis em relação ao ‘dec’.

Attr:

data %>%
    ggplot(aes(x = dec, y = attr)) +
    geom_violin(aes(fill = dec)) + 
    geom_count() +
    coord_flip()
Warning: Removed 118 rows containing non-finite values (stat_ydensity).
Warning: Removed 118 rows containing non-finite values (stat_sum).

Sinc:

data %>%
    ggplot(aes(x = dec, y = sinc)) +
    geom_violin(aes(fill = dec)) + 
    geom_count() +
    coord_flip()
Warning: Removed 161 rows containing non-finite values (stat_ydensity).
Warning: Removed 161 rows containing non-finite values (stat_sum).

Intel:

data %>%
    ggplot(aes(x = dec, y = intel)) +
    geom_violin(aes(fill = dec)) + 
    geom_count() +
    coord_flip()
Warning: Removed 166 rows containing non-finite values (stat_ydensity).
Warning: Removed 166 rows containing non-finite values (stat_sum).

Fun:

data %>%
    ggplot(aes(x = dec, y = fun)) +
    geom_violin(aes(fill = dec)) + 
    geom_count() +
    coord_flip()
Warning: Removed 197 rows containing non-finite values (stat_ydensity).
Warning: Removed 197 rows containing non-finite values (stat_sum).

Amb:

data %>%
    ggplot(aes(x = dec, y = amb)) +
    geom_violin(aes(fill = dec)) + 
    geom_count() +
    coord_flip()
Warning: Removed 421 rows containing non-finite values (stat_ydensity).
Warning: Removed 421 rows containing non-finite values (stat_sum).

Shar:

data %>%
    ggplot(aes(x = dec, y = shar)) +
    geom_violin(aes(fill = dec)) + 
    geom_count() +
    coord_flip()
Warning: Removed 643 rows containing non-finite values (stat_ydensity).
Warning: Removed 643 rows containing non-finite values (stat_sum).

É interessante ver que para quase todo, as variáveis influenciaram de alguma maneira, nem que seja pouca, para que a pessoa_1 marcasse um match com a pessoa_2. Um variável que das observadas acima que não demonstraram uma diferença visual foi a de Ambição, algo bem interessante!

Podemos portanto, iniciar nosso planejamento em criar um modelo de regressão logística que realize um treinamento sobre os dados presentes e nos informe os valores de coeficientes encontrados!

model_matches = glm(dec ~ attr + sinc + intel + fun + amb + shar, 
                  data = data, 
                  family = "binomial")

tidy(model_matches, conf.int = TRUE, exponentiate = TRUE)
summary(model_matches, conf.int = TRUE, exponentiate = TRUE)

Call:
glm(formula = dec ~ attr + sinc + intel + fun + amb + shar, family = "binomial", 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5362  -0.8250  -0.3011   0.8392   3.4541  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.104342   0.243021 -21.004  < 2e-16 ***
attr         0.562454   0.028511  19.727  < 2e-16 ***
sinc        -0.129343   0.031508  -4.105 4.04e-05 ***
intel        0.006912   0.038536   0.179    0.858    
fun          0.258122   0.030706   8.406  < 2e-16 ***
amb         -0.193363   0.030457  -6.349 2.17e-10 ***
shar         0.323160   0.024189  13.360  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5625.5  on 4127  degrees of freedom
Residual deviance: 4139.8  on 4121  degrees of freedom
  (790 observations deleted due to missingness)
AIC: 4153.8

Number of Fisher Scoring iterations: 5

Dado o resultado calculado pelo modelo acima, podemos ver que o impacto positivo estão mais associados as variáveis de ‘fun’, ‘attr’ e ‘shar’, ou seja, pessoas divertidas, atraentes e que compartilham os mesmos interesses respectivamente.

Para os valores estimados pelo nosso modelo de regressão logística, temos os seguintes resultados:

attr: Sempre que o attr aumentar uma unidade, as chances do match ocorrer é de 75%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [1.63, 1.82], ou seja, chances de 63% a 82%.

sinc: Sempre que o sinc aumentar uma unidade, as chances do match ocorrer é de -12,93%, (os valores podem ser melhor vistos na sumarização disposta na célula acima). Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [0.825, 0.934], ou seja, chances de -0.1740% a -0.0654%.

intel: Sempre que o intel aumentar uma unidade, as chances do match ocorrer é de 0.69%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [0.933, 1.085], ou seja, chances de -0.66% a 0.85%.

fun: Sempre que o fun aumentar uma unidade, as chances do match ocorrer é de 29,44%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [1.21, 1.37], ou seja, chances de 21% a 37%.

amb: Sempre que o amb aumentar uma unidade, as chances do match ocorrer é de -17,59%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [0.776, 0.874], ou seja, chances de -22,4% a -12,6%.

shar: Sempre que o shar aumentar uma unidade, as chances do match ocorrer é de 38,14%. Segundo a nossa amostra, o IC encontrado com uma confiança de 95% pode variar entre [1.31, 1.4490], ou seja, chances de 31% a 44%.

LS0tDQp0aXRsZTogIlJlZ3Jlc3PDo28gTG9nw61zdGljYTogVXRpbGl6YW5kbyBkYWRvcyByb23Dom50aWNvcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCmBgYHtyfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KGJyb29tKQ0KDQpkYXRhID0gcmVhZF9jc3YoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9uYXphcmVuby9jaWVuY2lhLWRlLWRhZG9zLTEvbWFzdGVyLzUtcmVncmVzc2FvL3NwZWVkLWRhdGluZy9zcGVlZC1kYXRpbmcyLmNzdiIsIGNvbF90eXBlcyA9IA0KICAgICAgICAgICAgICAgY29scygNCiAgICAgICAgICAgICAgICAgICAgICAuZGVmYXVsdCA9IGNvbF9kb3VibGUoKSwNCiAgICAgICAgICAgICAgICAgICAgICBmaWVsZCA9IGNvbF9jaGFyYWN0ZXIoKSwNCiAgICAgICAgICAgICAgICAgICAgICBmcm9tID0gY29sX2NoYXJhY3RlcigpLA0KICAgICAgICAgICAgICAgICAgICAgIGNhcmVlciA9IGNvbF9jaGFyYWN0ZXIoKSwNCiAgICAgICAgICAgICAgICAgICAgICBkZWMgPSBjb2xfY2hhcmFjdGVyKCkNCiAgICAgICAgICAgICAgICAgICAgKSkgJT4lIA0KICAgICAgICBtdXRhdGUoZGVjID0gYXMuZmFjdG9yKGRlYyksIGdlbmRlciA9IGFzLmZhY3RvcihnZW5kZXIpKQ0KYGBgDQoNCk9zIGRhZG9zIHF1ZSBpcmVtb3MgdXRpbGl6YXIgcGFyYSBlc3NhIGFuw6FsaXNlIHPDo28gZGFkb3MgcXVlIGRlc2NyZXZlbSA1MDAwIGVuY29udHJvcyByZWzDom1wYWdvcyBvdSBzcGVlZCBkYXRpbmcuIA0KRXNzZXMgZW5jb250cm9zIG9jb3JyZW0gZW0gdW1hIGR1cmHDp8OjbyBkZSA0IG1pbnV0b3MgZSBlbnZvbHZlcmFtIDMxMCBqb3ZlbnMgZG9zIEVzdGFkb3MgVW5pZG9zLiBMb2dvIGFww7NzIGVzc2VzIGVuY29udHJvcywgb3Mgam92ZW5zIHByZWVuY2hpYW0gdW1hIGZpY2hhIGF2YWxpYW5kbyBhcyBwZXNzb2FzIGNvbSBxdWVtIGVsZXMgc2UgZW5jb250cmFyYW0uDQoNCg0KQXMgZmljaGFzIHByZWVuY2hpZGFzIGNvbnRpbmhhbSBkaXZlcnNhcyBpbmZvcm1hw6fDtWVzLCBlbnRyZSBlbGFzIGVzdMOjbzoNCg0KLSAqYXR0cio6IEluZGljYSBvIHF1w6NvIGF0cmFlbnRlIGEgcGVzc29hXzEgYWNob3UgZGEgcGVzc29hXzIgDQotICpzaW5jKjogSW5kaWNhIG8gcXXDo28gc2luY2VyYSBhIHBlc3NvYV8xIGFjaG91IGRhIHBlc3NvYV8yIA0KLSAqaW50ZWwqOiBJbmRpY2EgbyBxdcOjbyBpbnRlbGlnZW50ZSBhIHBlc3NvYV8xIGFjaG91IGRhIHBlc3NvYV8yIA0KLSAqZnVuKjogSW5kaWNhIG8gcXXDo28gZGl2ZXJ0aWRhIGEgcGVzc29hXzEgYWNob3UgZGEgcGVzc29hXzIgDQotICphbWIqOiBJbmRpY2EgbyBxdcOjbyBhbWJpY2lvc2EgYSBwZXNzb2FfMSBhY2hvdSBkYSBwZXNzb2FfMg0KLSAqc2hhcio6IEluZGljYSBvIHF1YW50byBhIHBlc3NvYV8xIGNvbXBhcnRpbGhhIGRlIGludGVyZXNzZXMgc2VtZWxoYW50ZXMgY29tIGEgcGVzc29hXzINCg0KQSBpbmZvcm1hw6fDo28gcXVlIHF1ZXJlbW9zIGVudGVuZGVyIGRlc3NlcyBkYWRvcyDDqSBzZSBlbGFzIHPDo28gdW0gY29uanVudG8gZGUgdmFyacOhdmVpcyBleHBsaWNhdGl2YXMgcGFyYSBwcmVkaXplciBvIHZhbG9yIGRlICdkZWMnLiBFIG8gcXVlIHNlcmlhIG8gZGVjID8gw4kgc2ltcGxlc21lbnRlIHNlIGhvdXZlIHVtIG1hdGNoIGVudHJlIGFzIHBlc3NvYXMgbmVzc2UgZW5jb250cm8uDQoNClBhcmEgY29tZcOnYXIsIHZhbW9zIHZpc3VhbGl6YXIgYSBkaXN0cmlidWnDp8OjbyBkb3MgdmFsb3JlcyBkZSBjYWRhIHVtYSBkZXNzYXMgdmFyacOhdmVpcyBlbSByZWxhw6fDo28gYW8gJ2RlYycuDQoNCiMjIyMgQXR0cjoNCmBgYHtyfQ0KZGF0YSAlPiUNCiAgICBnZ3Bsb3QoYWVzKHggPSBkZWMsIHkgPSBhdHRyKSkgKw0KICAgIGdlb21fdmlvbGluKGFlcyhmaWxsID0gZGVjKSkgKyANCiAgICBnZW9tX2NvdW50KCkgKw0KICAgIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjIyMgU2luYzoNCmBgYHtyfQ0KZGF0YSAlPiUNCiAgICBnZ3Bsb3QoYWVzKHggPSBkZWMsIHkgPSBzaW5jKSkgKw0KICAgIGdlb21fdmlvbGluKGFlcyhmaWxsID0gZGVjKSkgKyANCiAgICBnZW9tX2NvdW50KCkgKw0KICAgIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjIyMgSW50ZWw6DQpgYGB7cn0NCmRhdGEgJT4lDQogICAgZ2dwbG90KGFlcyh4ID0gZGVjLCB5ID0gaW50ZWwpKSArDQogICAgZ2VvbV92aW9saW4oYWVzKGZpbGwgPSBkZWMpKSArIA0KICAgIGdlb21fY291bnQoKSArDQogICAgY29vcmRfZmxpcCgpDQpgYGANCg0KIyMjIyBGdW46DQpgYGB7cn0NCmRhdGEgJT4lDQogICAgZ2dwbG90KGFlcyh4ID0gZGVjLCB5ID0gZnVuKSkgKw0KICAgIGdlb21fdmlvbGluKGFlcyhmaWxsID0gZGVjKSkgKyANCiAgICBnZW9tX2NvdW50KCkgKw0KICAgIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjIyMgQW1iOg0KYGBge3J9DQpkYXRhICU+JQ0KICAgIGdncGxvdChhZXMoeCA9IGRlYywgeSA9IGFtYikpICsNCiAgICBnZW9tX3Zpb2xpbihhZXMoZmlsbCA9IGRlYykpICsgDQogICAgZ2VvbV9jb3VudCgpICsNCiAgICBjb29yZF9mbGlwKCkNCmBgYA0KDQojIyMjIFNoYXI6DQpgYGB7cn0NCmRhdGEgJT4lDQogICAgZ2dwbG90KGFlcyh4ID0gZGVjLCB5ID0gc2hhcikpICsNCiAgICBnZW9tX3Zpb2xpbihhZXMoZmlsbCA9IGRlYykpICsgDQogICAgZ2VvbV9jb3VudCgpICsNCiAgICBjb29yZF9mbGlwKCkNCmBgYA0KDQrDiSBpbnRlcmVzc2FudGUgdmVyIHF1ZSBwYXJhIHF1YXNlIHRvZG8sIGFzIHZhcmnDoXZlaXMgaW5mbHVlbmNpYXJhbSBkZSBhbGd1bWEgbWFuZWlyYSwgbmVtIHF1ZSBzZWphIHBvdWNhLCBwYXJhIHF1ZSBhIHBlc3NvYV8xIG1hcmNhc3NlIHVtIG1hdGNoIGNvbSBhIHBlc3NvYV8yLiBVbSB2YXJpw6F2ZWwgcXVlIGRhcyBvYnNlcnZhZGFzIGFjaW1hIHF1ZSBuw6NvIGRlbW9uc3RyYXJhbSB1bWEgZGlmZXJlbsOnYSB2aXN1YWwgZm9pIGEgZGUgQW1iacOnw6NvLCBhbGdvIGJlbSBpbnRlcmVzc2FudGUhDQoNCg0KUG9kZW1vcyBwb3J0YW50bywgaW5pY2lhciBub3NzbyBwbGFuZWphbWVudG8gZW0gY3JpYXIgdW0gbW9kZWxvIGRlIHJlZ3Jlc3PDo28gbG9nw61zdGljYSBxdWUgcmVhbGl6ZSB1bSB0cmVpbmFtZW50byBzb2JyZSBvcyBkYWRvcyBwcmVzZW50ZXMgZSBub3MgaW5mb3JtZSBvcyB2YWxvcmVzIGRlIGNvZWZpY2llbnRlcyBlbmNvbnRyYWRvcyENCg0KYGBge3J9DQptb2RlbF9tYXRjaGVzID0gZ2xtKGRlYyB+IGF0dHIgKyBzaW5jICsgaW50ZWwgKyBmdW4gKyBhbWIgKyBzaGFyLCANCiAgICAgICAgICAgICAgICAgIGRhdGEgPSBkYXRhLCANCiAgICAgICAgICAgICAgICAgIGZhbWlseSA9ICJiaW5vbWlhbCIpDQoNCnRpZHkobW9kZWxfbWF0Y2hlcywgY29uZi5pbnQgPSBUUlVFLCBleHBvbmVudGlhdGUgPSBUUlVFKQ0KYGBgDQoNCmBgYHtyfQ0Kc3VtbWFyeShtb2RlbF9tYXRjaGVzLCBjb25mLmludCA9IFRSVUUsIGV4cG9uZW50aWF0ZSA9IFRSVUUpDQpgYGANCg0KDQpEYWRvIG8gcmVzdWx0YWRvIGNhbGN1bGFkbyBwZWxvIG1vZGVsbyBhY2ltYSwgcG9kZW1vcyB2ZXIgcXVlIG8gaW1wYWN0byBwb3NpdGl2byBlc3TDo28gbWFpcyBhc3NvY2lhZG9zIGFzIHZhcmnDoXZlaXMgZGUgJ2Z1bicsICdhdHRyJyBlICdzaGFyJywgb3Ugc2VqYSwgcGVzc29hcyBkaXZlcnRpZGFzLCBhdHJhZW50ZXMgZSBxdWUgY29tcGFydGlsaGFtIG9zIG1lc21vcyBpbnRlcmVzc2VzIHJlc3BlY3RpdmFtZW50ZS4gDQoNClBhcmEgb3MgdmFsb3JlcyBlc3RpbWFkb3MgcGVsbyBub3NzbyBtb2RlbG8gZGUgcmVncmVzc8OjbyBsb2fDrXN0aWNhLCB0ZW1vcyBvcyBzZWd1aW50ZXMgcmVzdWx0YWRvczoNCg0KYXR0cjogU2VtcHJlIHF1ZSBvIGF0dHIgYXVtZW50YXIgdW1hIHVuaWRhZGUsIGFzIGNoYW5jZXMgZG8gbWF0Y2ggb2NvcnJlciDDqSBkZSA3NSUuIFNlZ3VuZG8gYSBub3NzYSBhbW9zdHJhLCBvICBJQyBlbmNvbnRyYWRvIGNvbSB1bWEgY29uZmlhbsOnYSBkZSA5NSUgcG9kZSB2YXJpYXIgZW50cmUgWzEuNjMsIDEuODJdLCBvdSBzZWphLCBjaGFuY2VzIGRlIDYzJSBhIDgyJS4NCg0Kc2luYzogU2VtcHJlIHF1ZSBvIHNpbmMgYXVtZW50YXIgdW1hIHVuaWRhZGUsIGFzIGNoYW5jZXMgZG8gbWF0Y2ggb2NvcnJlciDDqSBkZSAtMTIsOTMlLCAob3MgdmFsb3JlcyBwb2RlbSBzZXIgbWVsaG9yIHZpc3RvcyBuYSBzdW1hcml6YcOnw6NvIGRpc3Bvc3RhIG5hIGPDqWx1bGEgYWNpbWEpLiBTZWd1bmRvIGEgbm9zc2EgYW1vc3RyYSwgbyBJQyBlbmNvbnRyYWRvIGNvbSB1bWEgY29uZmlhbsOnYSBkZSA5NSUgcG9kZSB2YXJpYXIgZW50cmUgWzAuODI1LCAwLjkzNF0sIG91IHNlamEsIGNoYW5jZXMgZGUgLTAuMTc0MCUgYSAtMC4wNjU0JS4NCg0KaW50ZWw6IFNlbXByZSBxdWUgbyBpbnRlbCBhdW1lbnRhciB1bWEgdW5pZGFkZSwgYXMgY2hhbmNlcyBkbyBtYXRjaCBvY29ycmVyIMOpIGRlIDAuNjklLiBTZWd1bmRvIGEgbm9zc2EgYW1vc3RyYSwgbyAgSUMgZW5jb250cmFkbyBjb20gdW1hIGNvbmZpYW7Dp2EgZGUgOTUlIHBvZGUgdmFyaWFyIGVudHJlIFswLjkzMywgMS4wODVdLCBvdSBzZWphLCBjaGFuY2VzIGRlIC0wLjY2JSBhIDAuODUlLg0KDQpmdW46IFNlbXByZSBxdWUgbyBmdW4gYXVtZW50YXIgdW1hIHVuaWRhZGUsIGFzIGNoYW5jZXMgZG8gbWF0Y2ggb2NvcnJlciDDqSBkZSAyOSw0NCUuIFNlZ3VuZG8gYSBub3NzYSBhbW9zdHJhLCBvICBJQyBlbmNvbnRyYWRvIGNvbSB1bWEgY29uZmlhbsOnYSBkZSA5NSUgcG9kZSB2YXJpYXIgZW50cmUgWzEuMjEsIDEuMzddLCBvdSBzZWphLCBjaGFuY2VzIGRlIDIxJSBhIDM3JS4NCg0KYW1iOiBTZW1wcmUgcXVlIG8gYW1iIGF1bWVudGFyIHVtYSB1bmlkYWRlLCBhcyBjaGFuY2VzIGRvIG1hdGNoIG9jb3JyZXIgw6kgZGUgLTE3LDU5JS4gU2VndW5kbyBhIG5vc3NhIGFtb3N0cmEsIG8gIElDIGVuY29udHJhZG8gY29tIHVtYSBjb25maWFuw6dhIGRlIDk1JSBwb2RlIHZhcmlhciBlbnRyZSBbMC43NzYsIDAuODc0XSwgb3Ugc2VqYSwgY2hhbmNlcyBkZSAtMjIsNCUgYSAtMTIsNiUuDQoNCnNoYXI6IFNlbXByZSBxdWUgbyBzaGFyIGF1bWVudGFyIHVtYSB1bmlkYWRlLCBhcyBjaGFuY2VzIGRvIG1hdGNoIG9jb3JyZXIgw6kgZGUgMzgsMTQlLiBTZWd1bmRvIGEgbm9zc2EgYW1vc3RyYSwgbyAgSUMgZW5jb250cmFkbyBjb20gdW1hIGNvbmZpYW7Dp2EgZGUgOTUlIHBvZGUgdmFyaWFyIGVudHJlIFsxLjMxLCAxLjQ0OTBdLCBvdSBzZWphLCBjaGFuY2VzIGRlIDMxJSBhIDQ0JS4=