透過事前的問題可以知道給個人對於各類事物的喜好程度
用自己對各種興趣的喜好(1-10)當自變數
對方是否對自己有興趣當作反應變數
\(dec_{pred}= \\ \beta_0+\beta_1\ sports+\beta_2\ tvsports+\beta_3\ excersice+\beta_4\ dinning+\beta_5\ museums \\ \ \ +\beta_6\ atr+\beta_7\ hiking+\beta_8\ gaming+\beta_9\ clubbing+\beta_{10}\ reading+\beta_{11}\ tv \\ \ \ +\beta_{12}\ theater+\beta_{13}\ movies+\beta_{14}\ concerts+\beta_{15}\ music+\beta_{16}\ shopping\\\ \ +\beta_{17}\ yoga\)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.699499e-01 0.22588400 -4.294017896 1.754683e-05
## sports 4.396830e-02 0.01360625 3.231477886 1.231519e-03
## tvsports -4.164909e-02 0.01220659 -3.412017613 6.448395e-04
## exercise 3.937751e-02 0.01264759 3.113440461 1.849198e-03
## dining 3.568129e-02 0.01831000 1.948732524 5.132738e-02
## museums -8.176128e-05 0.02747476 -0.002975868 9.976256e-01
## art 4.771713e-03 0.02382029 0.200321402 8.412292e-01
## hiking 3.268697e-02 0.01181010 2.767712358 5.645125e-03
## gaming -5.778836e-02 0.01149035 -5.029294471 4.922879e-07
## clubbing 4.421329e-02 0.01155367 3.826774246 1.298335e-04
## reading -1.099251e-02 0.01465726 -0.749970135 4.532727e-01
## tv -1.468673e-02 0.01370993 -1.071248172 2.840579e-01
## theater -2.280711e-02 0.01699074 -1.342326062 1.794903e-01
## movies -2.991890e-02 0.02004273 -1.492755758 1.355011e-01
## concerts -2.531595e-02 0.01840439 -1.375538457 1.689646e-01
## music 2.590403e-02 0.02018995 1.283016432 1.994863e-01
## shopping 5.367783e-02 0.01326894 4.045373347 5.223980e-05
## yoga 4.590679e-03 0.01104360 0.415686942 6.776391e-01
model1.1<-update(model0,.~.-museums -art -reading -theater -movies
-concerts-music -yoga -tv)
anova(model1.1,model0,test="Chisq")
## Analysis of Deviance Table
##
## Model 1: dec_o ~ sports + tvsports + exercise + dining + hiking + gaming +
## clubbing + shopping
## Model 2: dec_o ~ sports + tvsports + exercise + dining + museums + art +
## hiking + gaming + clubbing + reading + tv + theater + movies +
## concerts + music + shopping + yoga
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 5796 7794.2
## 2 5787 7777.9 9 16.305 0.06078 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(model0,model1.1)
## df AIC
## model0 18 7813.887
## model1.1 9 7812.192
#step(model1.1,.~.^2)
model.new<-glm(formula = dec_o ~ sports + tvsports + exercise
+ dining + hiking + gaming + clubbing + shopping
+ hiking:clubbing + dining:gaming + dining:clubbing
+ dining:shopping + sports:tvsports
+ tvsports:clubbing + gaming:clubbing
+ sports:gaming + exercise:dining
+ clubbing:shopping + exercise:hiking
+ dining:hiking, family = binomial(link = "logit"),
data = train)
AIC(model0,model1.1,model.new)
## df AIC
## model0 18 7813.887
## model1.1 9 7812.192
## model.new 21 7740.359
以30%的資料來test
tab
## Ypred
## Y 0 1
## 0 1260 242
## 1 755 237
正確率=\(\frac{1333+167}{1333+167+825+169}=0.597\)
\(auc = 0.5692534\)
約會前就能知道結果?
不能
每個選項分別可以填入1-10分 以自己是否願意繼續見面當作反應變數 \(dec_{pred}=\beta_0+\beta_1\ attr+\beta_2\ sinc+\beta_3\ intel+\beta_4\ fun+\beta_5\ amb+\beta_6\ shar\)
## glm(formula = dec ~ ., family = binomial(link = "logit"), data = train2)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.22889632 0.22344214 -23.4015674 4.119389e-121
## attr 0.54920080 0.02574977 21.3283767 6.191958e-101
## sinc -0.09626930 0.02961857 -3.2503017 1.152826e-03
## intel -0.01118552 0.03614843 -0.3094331 7.569921e-01
## fun 0.27111068 0.02823758 9.6010599 7.912652e-22
## amb -0.15745055 0.02805145 -5.6129191 1.989416e-08
## shar 0.26755610 0.02221556 12.0436353 2.095150e-33
model01<-update(model00,.~.-intel)
## glm(formula = dec ~ attr + sinc + fun + amb + shar, family = binomial(link = "logit"),
## data = train2)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.2592939 0.21234630 -24.767533 2.007129e-135
## attr 0.5497900 0.02573646 21.362300 2.996857e-101
## sinc -0.1004672 0.02615828 -3.840742 1.226630e-04
## fun 0.2707435 0.02809545 9.636563 5.603269e-22
## amb -0.1605729 0.02591154 -6.196963 5.756298e-10
## shar 0.2669980 0.02219823 12.027897 2.535355e-33
AIC(model00,model01)
## df AIC
## model00 7 5032.409
## model01 6 5035.351
以30%的資料來test
tab
## Ypred
## Y 0 1
## 0 940 247
## 1 293 623
正確率=\(\frac{940+623}{940+623+293+277}=0.732\)
正確率提昇了不少
\(auc = 0.822\)
把兩兩變數的交互作用考慮進去
model02<-glm(formula = dec ~ attr + sinc + intel + fun + amb + shar +
fun:amb + intel:shar + attr:shar, family = binomial(link = "logit"),
data = train2)
除了原本的變數以外,還加入了 fun:amb、intel:shar、attr:shar
這三個交互作用項
tab
## Ypred
## Y 0 1
## 0 943 244
## 1 288 628
正確率=\(\frac{943+628}{943+628+288+244}=0.747\)
\(auc = 0.8205\)
在約會前,如果只看興趣的話是很難預測會不會繼續有進展
其中,在“Sincere”和“Ambitious”上得到的分數愈高
繼續發展的可能就愈低