Carregamento do dados

Explorando dados para escolha de fatores

Dentre as informações presentes nesses dados, encontram-se informações sobre o que o avaliador achou do avaliado quanto à atração, sinceridade, ser engraçado, ambicioso e inteligente. Esses são fatores que provavelmente estão bem relacionados com a decisão de se encontrar novamente. Para analisar se isso se verifica na amostra de dados, esses foram plotados para cada uma dessa características.

No gráfico abaixo, é plotado o quão sincero o avaliador achou o avaliado em relação a se houve um encontro novamente.

dating %>% 
    drop_na(sinc)  %>% 
    ggplot(aes(x = sinc, y = dec)) + 
    geom_count() +
    NULL

No seguinte, plotou-se o quão engraçado o avaliador achou o avaliado. Percebe-se uma diferença quanto ao gráfico de sinceridade, há mais valores entre 7.5 e 10 - acha a pessoa bem engraçada - quando o encontro de fato aconteceu.

dating %>% 
    drop_na(fun)  %>% 
    ggplot(aes(x = fun, y = dec)) + 
    geom_count() +
    NULL

O gráfico a seguir é referente à atração que o avaliador senteiu pelo avaliado.

dating %>% 
    drop_na(attr)  %>% 
    ggplot(aes(x = attr, y = dec)) + 
    geom_count() +
    NULL

O gráfico a seguir é referente a quão ambicioso o avaliador achou o avaliado.

dating %>% 
    drop_na(amb)  %>% 
    ggplot(aes(x = amb, y = dec)) + 
    geom_count() +
    NULL

A seguir, o gráfico é referente a quão inteligente o avaliador achou o avaliado.

dating %>% 
    drop_na(intel)  %>% 
    ggplot(aes(x = intel, y = dec)) + 
    geom_count() +
    NULL

A seguir, o gráfico é referente à quanto o avaliador achou que compartilha interesses e hobbies com o avaliado.

dating %>% 
    drop_na(shar)  %>% 
    ggplot(aes(x = shar, y = dec)) + 
    geom_count() +
    NULL

Além dessas características, quiz se adicionar ao modelo o fator gênenro do avaliador.

dating %>% 
    drop_na(gender)  %>% 
    ggplot(aes(x = gender, y = dec)) + 
    geom_count() +
    NULL

O modelo

Após escolher os fatores que apresentam certa diferença entre se encontrar novamente (yes) ou não se encontrar novamente (no), podemos estimar um modelo baseado nesses fatores.

Para isso, os dados de interesse foram filtrados. A informação se houve ou não encontro novamente foi codificada em 1 e 0.


dating <- dating %>% 
  select(dec, gender, sinc, fun, intel, amb, attr, shar) %>%
  na.omit() %>%
  mutate(dec = ifelse(dec=="yes", 1, 0)) 

Foi então estimado o modelo por regressão logística:

\(\frac{p(dec)}{1 - p(dec)} = 0.005 * 1.38^{shar} 1.01^{intel} 0.83^{amb} 1.28^{fun} 1.74^{attr} 0.87^{sinc} 1.35^{gender}\)

Cada variável está mostrada abaixo:

sinc: quão sincero p1 achou p2

fun: quão divertido p1 achou p2

attr: quão atraente p1 achou p2

amb: quão ambicioso p1 achou p2

shar: quanto p1 achou que compartilha interesses e hobbies com p2

intel: quão inteligente p1 achou p2

A tabela a seguir mostra os coeficientes obtidos assim como os intervalos de confiança.

bm <- glm(dec ~ gender + sinc + fun + intel + amb + attr + shar, 
          data = dating, 
          family = "binomial")

tidy(bm, conf.int = TRUE, exponentiate = TRUE) %>% select(-p.value)

glance(bm)
pR2(bm)
fitting null model for pseudo-r2
          llh       llhNull            G2      McFadden          r2ML          r2CU 
-2062.0922894 -2812.7409190  1501.2972594     0.2668744     0.3048908     0.4097735 

Com a tabela e a figura abaixo, podemos entender melhor a contribuição de cada fator para o modelo estimado. O fator que apresenta maior efeito no “match” é a atração assumindo um coeficiente entre 1.64 e 1.84. EM seguida vem os fatores gênero, o quâo engraçada a pessoa foi e os assuntos em comuns que elas tem. A não ser pelo gênero, esses fatores já eram esperado que contribuissem para um match. Os fatores que impressinaram foi a inteligência que não representa nem um comportamento positivo nem negativo no match pois o intervalo de confiança apresenta essa incerteza (valores menores e maiores que 1). Em seguida, vem a sinceridade e ambição que geram um efeito de diminuição de probabilidade de dar o match.

tidy(bm, conf.int = TRUE, exponentiate = TRUE) %>% 
  filter(term != "(Intercept)") %>% 
  ggplot(aes(x = reorder(term, estimate), y = estimate, ymin = conf.low, ymax = conf.high)) + 
  geom_point() + 
  geom_linerange() + 
  coord_flip()

LS0tCnRpdGxlOiAiTDVQMzogUmVncmVzc8OjbyByb23Dom50aWNhIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgoKYGBge3IgaW5jbHVkZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KbGlicmFyeShwc2NsKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeSh0aWR5bW9kZWxzKQpsaWJyYXJ5KHNraW1yKQpsaWJyYXJ5KG1vZGVscikgCgp0aGVtZV9zZXQodGhlbWVfYncoKSkKYGBgCgojIyBDYXJyZWdhbWVudG8gZG8gZGFkb3MKCgpgYGB7ciB3YXJuaW5nPUZBTFNFfQpkYXRpbmcgPC0gcmVhZF9jc3YoInNwZWVkLWRhdGluZy9zcGVlZC1kYXRpbmcyLmNzdiIpCgoKI3NraW0oZGF0aW5nKQoKCmBgYAoKIyMgRXhwbG9yYW5kbyBkYWRvcyBwYXJhIGVzY29saGEgZGUgZmF0b3JlcwoKRGVudHJlIGFzIGluZm9ybWHDp8O1ZXMgcHJlc2VudGVzIG5lc3NlcyBkYWRvcywgZW5jb250cmFtLXNlIGluZm9ybWHDp8O1ZXMgc29icmUgbyBxdWUgbyBhdmFsaWFkb3IgYWNob3UgZG8gYXZhbGlhZG8gcXVhbnRvIMOgIGF0cmHDp8Ojbywgc2luY2VyaWRhZGUsIHNlciBlbmdyYcOnYWRvLCBhbWJpY2lvc28gZSBpbnRlbGlnZW50ZS4gRXNzZXMgc8OjbyBmYXRvcmVzIHF1ZSBwcm92YXZlbG1lbnRlIGVzdMOjbyBiZW0gcmVsYWNpb25hZG9zIGNvbSBhIGRlY2lzw6NvIGRlIHNlIGVuY29udHJhciBub3ZhbWVudGUuIFBhcmEgYW5hbGlzYXIgc2UgaXNzbyBzZSB2ZXJpZmljYSBuYSBhbW9zdHJhIGRlIGRhZG9zLCBlc3NlcyBmb3JhbSBwbG90YWRvcyBwYXJhIGNhZGEgdW1hIGRlc3NhIGNhcmFjdGVyw61zdGljYXMuCgoKTm8gZ3LDoWZpY28gYWJhaXhvLCDDqSBwbG90YWRvIG8gcXXDo28gc2luY2VybyBvIGF2YWxpYWRvciBhY2hvdSBvIGF2YWxpYWRvIGVtIHJlbGHDp8OjbyBhIHNlIGhvdXZlIHVtIGVuY29udHJvIG5vdmFtZW50ZS4gCgpgYGB7cn0KZGF0aW5nICU+JSAKICAgIGRyb3BfbmEoc2luYykgICU+JSAKICAgIGdncGxvdChhZXMoeCA9IHNpbmMsIHkgPSBkZWMpKSArIAogICAgZ2VvbV9jb3VudCgpICsKICAgIE5VTEwKYGBgCgpObyBzZWd1aW50ZSwgcGxvdG91LXNlIG8gcXXDo28gZW5ncmHDp2FkbyBvIGF2YWxpYWRvciBhY2hvdSBvIGF2YWxpYWRvLiBQZXJjZWJlLXNlIHVtYSBkaWZlcmVuw6dhIHF1YW50byBhbyBncsOhZmljbyBkZSBzaW5jZXJpZGFkZSwgaMOhIG1haXMgdmFsb3JlcyBlbnRyZSA3LjUgZSAxMCAtIGFjaGEgYSBwZXNzb2EgYmVtIGVuZ3Jhw6dhZGEgLSBxdWFuZG8gbyBlbmNvbnRybyBkZSBmYXRvIGFjb250ZWNldS4KCmBgYHtyfQpkYXRpbmcgJT4lIAogICAgZHJvcF9uYShmdW4pICAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBmdW4sIHkgPSBkZWMpKSArIAogICAgZ2VvbV9jb3VudCgpICsKICAgIE5VTEwKYGBgCgpPIGdyw6FmaWNvIGEgc2VndWlyIMOpIHJlZmVyZW50ZSDDoCBhdHJhw6fDo28gcXVlIG8gYXZhbGlhZG9yIHNlbnRlaXUgcGVsbyBhdmFsaWFkby4KCmBgYHtyfQpkYXRpbmcgJT4lIAogICAgZHJvcF9uYShhdHRyKSAgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gYXR0ciwgeSA9IGRlYykpICsgCiAgICBnZW9tX2NvdW50KCkgKwogICAgTlVMTApgYGAKCk8gZ3LDoWZpY28gYSBzZWd1aXIgw6kgcmVmZXJlbnRlIGEgcXXDo28gYW1iaWNpb3NvIG8gYXZhbGlhZG9yIGFjaG91IG8gYXZhbGlhZG8uCgpgYGB7cn0KZGF0aW5nICU+JSAKICAgIGRyb3BfbmEoYW1iKSAgJT4lIAogICAgZ2dwbG90KGFlcyh4ID0gYW1iLCB5ID0gZGVjKSkgKyAKICAgIGdlb21fY291bnQoKSArCiAgICBOVUxMCmBgYAoKQSBzZWd1aXIsIG8gZ3LDoWZpY28gw6kgcmVmZXJlbnRlIGEgcXXDo28gaW50ZWxpZ2VudGUgbyBhdmFsaWFkb3IgYWNob3UgbyBhdmFsaWFkby4KCmBgYHtyfQpkYXRpbmcgJT4lIAogICAgZHJvcF9uYShpbnRlbCkgICU+JSAKICAgIGdncGxvdChhZXMoeCA9IGludGVsLCB5ID0gZGVjKSkgKyAKICAgIGdlb21fY291bnQoKSArCiAgICBOVUxMCmBgYAoKCkEgc2VndWlyLCBvIGdyw6FmaWNvIMOpIHJlZmVyZW50ZSDDoCBxdWFudG8gbyBhdmFsaWFkb3IgYWNob3UgcXVlIGNvbXBhcnRpbGhhIGludGVyZXNzZXMgZSBob2JiaWVzIGNvbSBvIGF2YWxpYWRvLgoKYGBge3J9CmRhdGluZyAlPiUgCiAgICBkcm9wX25hKHNoYXIpICAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBzaGFyLCB5ID0gZGVjKSkgKyAKICAgIGdlb21fY291bnQoKSArCiAgICBOVUxMCmBgYAoKQWzDqW0gZGVzc2FzIGNhcmFjdGVyw61zdGljYXMsIHF1aXogc2UgYWRpY2lvbmFyIGFvIG1vZGVsbyBvIGZhdG9yIGfDqm5lbnJvIGRvIGF2YWxpYWRvci4KCmBgYHtyfQpkYXRpbmcgJT4lIAogICAgZHJvcF9uYShnZW5kZXIpICAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBnZW5kZXIsIHkgPSBkZWMpKSArIAogICAgZ2VvbV9jb3VudCgpICsKICAgIE5VTEwKYGBgCgojIyBPIG1vZGVsbwoKQXDDs3MgZXNjb2xoZXIgb3MgZmF0b3JlcyBxdWUgYXByZXNlbnRhbSBjZXJ0YSBkaWZlcmVuw6dhIGVudHJlIHNlIGVuY29udHJhciBub3ZhbWVudGUgKHllcykgb3UgbsOjbyBzZSBlbmNvbnRyYXIgbm92YW1lbnRlIChubyksIHBvZGVtb3MgZXN0aW1hciB1bSBtb2RlbG8gYmFzZWFkbyBuZXNzZXMgZmF0b3Jlcy4KCgpQYXJhIGlzc28sIG9zIGRhZG9zIGRlIGludGVyZXNzZSBmb3JhbSBmaWx0cmFkb3MuIEEgaW5mb3JtYcOnw6NvIHNlIGhvdXZlIG91IG7Do28gZW5jb250cm8gbm92YW1lbnRlIGZvaSBjb2RpZmljYWRhIGVtIDEgZSAwLgoKYGBge3J9CgpkYXRpbmcgPC0gZGF0aW5nICU+JSAKICBzZWxlY3QoZGVjLCBnZW5kZXIsIHNpbmMsIGZ1biwgaW50ZWwsIGFtYiwgYXR0ciwgc2hhcikgJT4lCiAgbmEub21pdCgpICU+JQogIG11dGF0ZShkZWMgPSBpZmVsc2UoZGVjPT0ieWVzIiwgMSwgMCkpIAoKYGBgCgoKRm9pIGVudMOjbyBlc3RpbWFkbyBvIG1vZGVsbyBwb3IgcmVncmVzc8OjbyBsb2fDrXN0aWNhOgoKJFxmcmFje3AoZGVjKX17MSAtIHAoZGVjKX0gPSAwLjAwNSAqIDEuMzhee3NoYXJ9IDEuMDFee2ludGVsfSAwLjgzXnthbWJ9IDEuMjhee2Z1bn0gMS43NF57YXR0cn0gMC44N157c2luY30gMS4zNV57Z2VuZGVyfSQKCkNhZGEgdmFyacOhdmVsIGVzdMOhIG1vc3RyYWRhIGFiYWl4bzoKCnNpbmM6ICBxdcOjbyBzaW5jZXJvIHAxIGFjaG91IHAyCgpmdW46IHF1w6NvIGRpdmVydGlkbyBwMSBhY2hvdSBwMgoKYXR0cjogcXXDo28gYXRyYWVudGUgcDEgIGFjaG91IHAyCgphbWI6IHF1w6NvIGFtYmljaW9zbyBwMSAgYWNob3UgcDIKCnNoYXI6IHF1YW50byBwMSBhY2hvdSBxdWUgY29tcGFydGlsaGEgaW50ZXJlc3NlcyBlIGhvYmJpZXMgY29tIHAyCgppbnRlbDogIHF1w6NvIGludGVsaWdlbnRlIHAxICBhY2hvdSBwMgoKCkEgdGFiZWxhIGEgc2VndWlyIG1vc3RyYSBvcyBjb2VmaWNpZW50ZXMgb2J0aWRvcyBhc3NpbSBjb21vIG9zIGludGVydmFsb3MgZGUgY29uZmlhbsOnYS4KCmBgYHtyfQpibSA8LSBnbG0oZGVjIH4gZ2VuZGVyICsgc2luYyArIGZ1biArIGludGVsICsgYW1iICsgYXR0ciArIHNoYXIsIAogICAgICAgICAgZGF0YSA9IGRhdGluZywgCiAgICAgICAgICBmYW1pbHkgPSAiYmlub21pYWwiKQoKdGlkeShibSwgY29uZi5pbnQgPSBUUlVFLCBleHBvbmVudGlhdGUgPSBUUlVFKSAlPiUgc2VsZWN0KC1wLnZhbHVlKQoKZ2xhbmNlKGJtKQpwUjIoYm0pCgpgYGAKCgpDb20gYSB0YWJlbGEgZSBhIGZpZ3VyYSBhYmFpeG8sIHBvZGVtb3MgZW50ZW5kZXIgbWVsaG9yIGEgY29udHJpYnVpw6fDo28gZGUgY2FkYSBmYXRvciBwYXJhIG8gbW9kZWxvIGVzdGltYWRvLiBPIGZhdG9yIHF1ZSBhcHJlc2VudGEgbWFpb3IgZWZlaXRvIG5vICJtYXRjaCIgw6kgYSBhdHJhw6fDo28gYXNzdW1pbmRvIHVtIGNvZWZpY2llbnRlIGVudHJlIDEuNjQgZSAxLjg0LiBFTSBzZWd1aWRhIHZlbSBvcyBmYXRvcmVzIGfDqm5lcm8sIG8gcXXDom8gZW5ncmHDp2FkYSBhIHBlc3NvYSBmb2kgZSBvcyBhc3N1bnRvcyBlbSBjb211bnMgcXVlIGVsYXMgdGVtLiBBIG7Do28gc2VyIHBlbG8gZ8OqbmVybywgZXNzZXMgZmF0b3JlcyBqw6EgZXJhbSBlc3BlcmFkbyBxdWUgY29udHJpYnVpc3NlbSBwYXJhIHVtIG1hdGNoLiBPcyBmYXRvcmVzIHF1ZSBpbXByZXNzaW5hcmFtIGZvaSBhIGludGVsaWfDqm5jaWEgcXVlIG7Do28gcmVwcmVzZW50YSBuZW0gdW0gY29tcG9ydGFtZW50byBwb3NpdGl2byBuZW0gbmVnYXRpdm8gbm8gbWF0Y2ggcG9pcyBvIGludGVydmFsbyBkZSBjb25maWFuw6dhIGFwcmVzZW50YSBlc3NhIGluY2VydGV6YSAodmFsb3JlcyBtZW5vcmVzIGUgbWFpb3JlcyBxdWUgMSkuIEVtIHNlZ3VpZGEsIHZlbSBhIHNpbmNlcmlkYWRlIGUgYW1iacOnw6NvIHF1ZSBnZXJhbSB1bSBlZmVpdG8gZGUgZGltaW51acOnw6NvIGRlIHByb2JhYmlsaWRhZGUgZGUgZGFyIG8gbWF0Y2guCgpgYGB7cn0KdGlkeShibSwgY29uZi5pbnQgPSBUUlVFLCBleHBvbmVudGlhdGUgPSBUUlVFKSAlPiUgCiAgZmlsdGVyKHRlcm0gIT0gIihJbnRlcmNlcHQpIikgJT4lIAogIGdncGxvdChhZXMoeCA9IHJlb3JkZXIodGVybSwgZXN0aW1hdGUpLCB5ID0gZXN0aW1hdGUsIHltaW4gPSBjb25mLmxvdywgeW1heCA9IGNvbmYuaGlnaCkpICsgCiAgZ2VvbV9wb2ludCgpICsgCiAgZ2VvbV9saW5lcmFuZ2UoKSArIAogIGNvb3JkX2ZsaXAoKQpgYGAKCgo=