Project Description

The outcome variable is Attending lawful demonstrations in the past 12 months (the question is “Now I’d like you to look at this card. I’m going to read out some different forms of political action that people can take, and I’d like you to tell me, for each one, whether you have actually done any of these things, whether you might do it or would never, under any circumstances, do it. Attending peaceful demonstrations”)

The data and its description

We will model participation in lawful demonstrations Q211 (as a form of political activity) by the respondent’s education, type of settlement, size of settlement, income, age, and interest in politics, controlling for the federal district.

df <- readRDS("ruwvs")
#str(df)
df$age <- as.numeric(df$age)
summary(df)

##  Q211                      intinpol               eduR    
##  0:1072   Very interested      :148   Primary       : 35  
##  1: 738   Somewhat interested  :638   Secondary     :469  
##           Not very interested  :713   Post-secondary:692  
##           Not at all interested:305   Tertiary      :604  
##           NA's                 :  6   NA's          : 10  
##                                                           
##                                                           
##             townsize  
##  Under 5,000    :391  
##  5000-20000     :203  
##  20000-100000   :284  
##  100000-500000  :342  
##  500000 and more:590  
##                       
##                       
##                                                   settlement 
##  Capital city                                          :149  
##  Regional center                                       :569  
##  District center                                       :542  
##  Another city, town (not a regional or district center): 58  
##  Village                                               :492  
##                                                              
##                                                              
##           income                                        region   
##  Fifth step  :385   RU: Central Federal District           :489  
##  Sixth step  :278   RU: Volga; Privolzhsky Federal District:382  
##  Fourth step :275   RU: Siberian Federal District          :239  
##  Third step  :262   RU: South Federal District             :183  
##  Seventh step:183   RU: North West Federal District        :177  
##  (Other)     :364   RU: Urals Federal District             :156  
##  NA's        : 63   (Other)                                :184  
##       age           income1        eduT     
##  Min.   : 3.00   Min.   :0.000   0   :1196  
##  1st Qu.:16.00   1st Qu.:2.000   1   : 604  
##  Median :28.00   Median :4.000   NA's:  10  
##  Mean   :30.41   Mean   :3.773              
##  3rd Qu.:43.00   3rd Qu.:5.000              
##  Max.   :76.00   Max.   :9.000              
##                  NA's   :63

Respondents’ socio-demographic charactreristics

Age histogram below is slightly right-skewed, which means that the sample has mostly been filled by younger generation. The average respondent’s age is 30 years old. As for the data about respondents’ income - it is normally disctrubuted, without strong deviations to one or another side. In education area the majority of respondents have post-secondary (38.25) or tetriary (33.4%) education. 26% of respondents have secondary education and only 2% of the sample have just primary education.

library(dplyr)
library(ggplot2)
# respondent's background

#age
ggplot() + 
  geom_histogram(data = df, 
                 aes(x = age),
                 binwidth = 5, 
                 fill = "cornflowerblue", 
                 col= "black", 
                 alpha = 0.5) +
  labs(subtitle="mean=30.41 (white solid line), median=28 (white dashed line),\nmin.=3, 1st Qu.=16, 3rd Qu.=43, max.=76",
       title = "Respondents' age", 
       x = "",
       y = "Number of respondents") +
  geom_vline(aes(xintercept = mean(df$age)), linetype="solid", color="white", size=1.5) +
  geom_vline(aes(xintercept = median(df$age)), linetype="dashed", color="white", size=1.5) +
  theme_bw()

#income
ggplot() + 
  geom_histogram(data=df, 
                 aes(x=income1), 
                 binwidth = 1, 
                 fill = "cornflowerblue", 
                 col= "black", 
                 alpha = 0.5) +
  labs(subtitle="mean=3.7 (white solid line), median=4 (white dashed line),\nmin.=0, 1st Qu.=2, 3rd Qu.=5, max.=9",
       title = "Respondents' income (10 level scale)", 
       x = "",
       y = "Number of respondents") +
  geom_vline(aes(xintercept = mean(df$income1)), linetype="solid", color="white", size=1.5) +
  geom_vline(aes(xintercept = median(df$income1)), linetype="dashed", color="white", size=1.5) +
  theme_bw()

#education
ggplot(data = df, aes(x = eduR)) +
  geom_bar(aes(y = prop.table(..count..) * 100),
             position = "dodge", color="black", fill="cornflowerblue",alpha = 0.7, width=0.8) +
  labs(title = "Respondents' levels of education", x = "", y = "percent", caption = "WVS 2017 Russia")+
  geom_text(aes(y = prop.table(..count..) * 100 + 0.5, 
                  label = paste0(round(prop.table(..count..) * 100,digits= 2), '%')), 
              stat = 'count', 
              position = position_dodge(), 
              size = 4) + 
  theme_bw()

Geographical characteristics:

All respondents are from Russia, mostly from Central Federal District (27%). Distribution over other districs can be seen on the graph below. The majority of them are from district (29.9%) or regional centers(31.4%) and villages(27.2%); respondnts’ towns are mostly large (32.6% have townsize 500.000+) or, in contrast, small (21.6% have townsize under 5.000).

# respondents' place of living 

#region
df <- within(df, region <- factor(region,
                                  levels=names(sort(table(region),
                                                    decreasing=FALSE))))
ggplot(data = df, aes(x = region)) +
  geom_bar(aes(y = prop.table(..count..) * 100),
             position = "dodge", color="black", fill="cornflowerblue", alpha = 0.7, width=0.8) +
  coord_flip() +
  labs(title = "Respondents' region", x = "", y = "percent", caption = "WVS 2017 Russia")+
  geom_text(aes(y = prop.table(..count..) * 100  - 1.5, 
                  label = paste0(round(prop.table(..count..) * 100,digits= 2), '%')), 
              stat = 'count', 
              #position = position_dodge(), 
              size = 4) + 
  theme_bw()

#settlement
ggplot(data = df, aes(x = settlement)) +
  geom_bar(aes(y = prop.table(..count..) * 100),
             position = "dodge", color="black", fill="cornflowerblue", alpha = 0.7, width=0.8) +
  coord_flip() +
  labs(title = "Respondents' settlement", x = "", y = "percent", caption = "WVS 2017 Russia")+
  geom_text(aes(y = prop.table(..count..) * 100  - 1.5, 
                  label = paste0(round(prop.table(..count..) * 100,digits= 2), '%')), 
              stat = 'count', 
              #position = position_dodge(), 
              size = 4) + 
  theme_bw()

#townsize  
ggplot(data = df, aes(x = townsize)) +
  geom_bar(aes(y = prop.table(..count..) * 100),
             position = "dodge", color="black", fill="cornflowerblue",alpha = 0.7, width=0.8) +
  coord_flip() +
  labs(title = "Respondents' town's size", x = "", y = "percent", caption = "WVS 2017 Russia")+
  geom_text(aes(y = ..count../sum(..count..)*100 - 1.5, 
                  label = paste0(round(..count../sum(..count..)*100,digits= 2), '%')), 
              stat = 'count', 
              #position = position_dodge(), 
              size = 4) + 
  theme_bw()

When it comes to respondents’ interest in politics it can be seen that 2/5 of the sample are not really interested in politics. Taking into account those, who are not interested in politics at all, it takes 56% of the sample of those, who ignore politics.

# interest in politics and political activity

ggplot(df, 
       aes(x = intinpol, y=..count../sum(..count..)*100, 
           fill = Q211)) +
  geom_bar(position = "stack")+
  labs(title = "Respondents' interest in politics", x = "", y = "percent", fill = "Politically active?", caption = "WVS 2017 Russia")+
  geom_text(aes(y = prop.table(..count..) * 100 +1, 
                  label = paste0(round(prop.table(..count..) * 100,digits= 2), '%')), 
              stat = 'count', 
              position = position_stack(vjust = 0.5), 
              size = 4) + 
  coord_flip() +
  theme_bw() +
  scale_fill_manual(values=c("sienna3", "skyblue3"))

Logistic regression

library(polycor)
df_cor <- hetcor(df)
df_cor

## 
## Two-Step Estimates
## 
## Correlations/Type of Correlation:
##                 Q211   intinpol       eduR   townsize settlement
## Q211               1 Polychoric Polychoric Polychoric Polychoric
## intinpol        -0.2          1 Polychoric Polychoric Polychoric
## eduR         0.07058    -0.1151          1 Polychoric Polychoric
## townsize   -0.002576    0.03294     0.2528          1 Polychoric
## settlement   -0.1093    -0.0178    -0.2516    -0.9847          1
## income        -0.023    -0.1356     0.3034     0.1048   -0.09324
## region        0.1525     -0.018     0.1033    0.05894    -0.2217
## age          0.02156     -0.117    -0.2265    -0.0947    0.08369
## income1     -0.02156    -0.1332     0.3006     0.1057   -0.09373
## eduT         0.08664   -0.08995       <NA>     0.2702     -0.267
##                income     region        age    income1       eduT
## Q211       Polychoric Polychoric Polyserial Polyserial Polychoric
## intinpol   Polychoric Polychoric Polyserial Polyserial Polychoric
## eduR       Polychoric Polychoric Polyserial Polyserial Polychoric
## townsize   Polychoric Polychoric Polyserial Polyserial Polychoric
## settlement Polychoric Polychoric Polyserial Polyserial Polychoric
## income              1 Polychoric Polyserial Polyserial Polychoric
## region         0.0376          1 Polyserial Polyserial Polychoric
## age            -0.276    0.03911          1    Pearson Polyserial
## income1        0.9999    0.03637    -0.2745          1 Polyserial
## eduT           0.3423     0.1176      -0.23     0.3373          1
## 
## Standard Errors:
##               Q211 intinpol    eduR townsize settlement   income  region
## Q211                                                                    
## intinpol   0.03144                                                      
## eduR       0.03305  0.02771                                             
## townsize   0.03279  0.02776 0.02654                                     
## settlement 0.03207  0.02738  0.0262 0.001632                            
## income     0.03074  0.02554 0.02368  0.02599     0.0256                 
## region     0.03103  0.02677 0.02712  0.02681    0.02561  0.02528        
## age        0.03032  0.02527 0.02439  0.02562    0.02548  0.02182 0.02508
## income1    0.03032  0.02513 0.02326  0.02559    0.02524 0.002691 0.02494
## eduT       0.03911  0.03323       0  0.03175    0.03127  0.02811 0.03218
##                age income1
## Q211                      
## intinpol                  
## eduR                      
## townsize                  
## settlement                
## income                    
## region                    
## age                       
## income1     0.0222        
## eduT       0.03033 0.02741
## 
## n = 1736 
## 
## P-values for Tests of Bivariate Normality:
##                 Q211  intinpol      eduR              townsize settlement
## Q211                                                                     
## intinpol     0.02881                                                     
## eduR          0.7348    0.1433                                           
## townsize    0.001121   0.06721    0.1324                                 
## settlement 9.393e-06  0.008892   0.00703             7.572e-12           
## income        0.6202  0.006957 7.566e-07               0.01495  8.763e-06
## region        0.2317 1.557e-08 8.174e-08              3.28e-25 5.092e-110
## age        2.023e-10 7.045e-10 4.085e-31            1.423e-280  1.666e-31
## income1    3.558e-42 2.126e-42 3.611e-42 7.56899999860163e-315   8.22e-66
## eduT            <NA>    0.2291         0                0.5738     0.1498
##               income     region       age   income1
## Q211                                               
## intinpol                                           
## eduR                                               
## townsize                                           
## settlement                                         
## income                                             
## region      1.23e-15                               
## age         1.07e-11 3.381e-213                    
## income1          NaN 8.684e-253 8.425e-52          
## eduT       8.759e-05  0.0003211 1.846e-20 2.362e-42

library(ggcorrplot)
df_corplot <- round(df_cor$correlations, 2)
ggcorrplot(df_corplot, type = "lower")

First, we see strong correlation between two income variables and between townsize and settlement. This means that for the model we should choose one of these two variables. Thus I will exclude “townsize” variable, because it is less corralated with the outcome than settlement, and factor income variable, as it is less convenient with the same correlational level. Also I will exclude “eduR” variable, leaving binary variable to the regression for the same reason.

#general model
model_full <- glm(Q211 ~ age + income1 + eduT + settlement + region + intinpol, data = df, family="binomial"(link=logit)) 
summary(model_full) #AIC: 2283.4

## 
## Call:
## glm(formula = Q211 ~ age + income1 + eduT + settlement + region + 
##     intinpol, family = binomial(link = logit), data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8052  -1.0272  -0.7765   1.2232   2.0811  
## 
## Coefficients:
##                                                                    Estimate
## (Intercept)                                                       0.8221265
## age                                                              -0.0009979
## income1                                                          -0.0544330
## eduT1                                                             0.1788241
## settlementRegional center                                        -0.4597112
## settlementDistrict center                                        -0.8721639
## settlementAnother city, town (not a regional or district center) -0.9810863
## settlementVillage                                                -0.5133248
## regionRU: North Caucasian federal district                        0.1437896
## regionRU: Urals Federal District                                 -0.1227691
## regionRU: North West Federal District                             0.1589217
## regionRU: South Federal District                                 -0.3330271
## regionRU: Siberian Federal District                               0.3835438
## regionRU: Volga; Privolzhsky Federal District                     0.3173954
## regionRU: Central Federal District                                0.4590180
## intinpolSomewhat interested                                      -0.4609568
## intinpolNot very interested                                      -0.6365091
## intinpolNot at all interested                                    -1.4088843
##                                                                  Std. Error
## (Intercept)                                                       0.4070988
## age                                                               0.0031580
## income1                                                           0.0289287
## eduT1                                                             0.1138132
## settlementRegional center                                         0.2254509
## settlementDistrict center                                         0.2170352
## settlementAnother city, town (not a regional or district center)  0.3581382
## settlementVillage                                                 0.2284048
## regionRU: North Caucasian federal district                        0.3372543
## regionRU: Urals Federal District                                  0.3013116
## regionRU: North West Federal District                             0.2952891
## regionRU: South Federal District                                  0.3011301
## regionRU: Siberian Federal District                               0.2818394
## regionRU: Volga; Privolzhsky Federal District                     0.2694427
## regionRU: Central Federal District                                0.2716458
## intinpolSomewhat interested                                       0.1937967
## intinpolNot very interested                                       0.1931379
## intinpolNot at all interested                                     0.2251833
##                                                                  z value
## (Intercept)                                                        2.019
## age                                                               -0.316
## income1                                                           -1.882
## eduT1                                                              1.571
## settlementRegional center                                         -2.039
## settlementDistrict center                                         -4.019
## settlementAnother city, town (not a regional or district center)  -2.739
## settlementVillage                                                 -2.247
## regionRU: North Caucasian federal district                         0.426
## regionRU: Urals Federal District                                  -0.407
## regionRU: North West Federal District                              0.538
## regionRU: South Federal District                                  -1.106
## regionRU: Siberian Federal District                                1.361
## regionRU: Volga; Privolzhsky Federal District                      1.178
## regionRU: Central Federal District                                 1.690
## intinpolSomewhat interested                                       -2.379
## intinpolNot very interested                                       -3.296
## intinpolNot at all interested                                     -6.257
##                                                                  Pr(>|z|)
## (Intercept)                                                      0.043438
## age                                                              0.752001
## income1                                                          0.059886
## eduT1                                                            0.116135
## settlementRegional center                                        0.041443
## settlementDistrict center                                        5.86e-05
## settlementAnother city, town (not a regional or district center) 0.006155
## settlementVillage                                                0.024612
## regionRU: North Caucasian federal district                       0.669850
## regionRU: Urals Federal District                                 0.683678
## regionRU: North West Federal District                            0.590446
## regionRU: South Federal District                                 0.268759
## regionRU: Siberian Federal District                              0.173558
## regionRU: Volga; Privolzhsky Federal District                    0.238809
## regionRU: Central Federal District                               0.091073
## intinpolSomewhat interested                                      0.017381
## intinpolNot very interested                                      0.000982
## intinpolNot at all interested                                    3.93e-10
##                                                                     
## (Intercept)                                                      *  
## age                                                                 
## income1                                                          .  
## eduT1                                                               
## settlementRegional center                                        *  
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ** 
## settlementVillage                                                *  
## regionRU: North Caucasian federal district                          
## regionRU: Urals Federal District                                    
## regionRU: North West Federal District                               
## regionRU: South Federal District                                    
## regionRU: Siberian Federal District                                 
## regionRU: Volga; Privolzhsky Federal District                       
## regionRU: Central Federal District                               .  
## intinpolSomewhat interested                                      *  
## intinpolNot very interested                                      ***
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2357.9  on 1735  degrees of freedom
## Residual deviance: 2247.4  on 1718  degrees of freedom
##   (74 observations deleted due to missingness)
## AIC: 2283.4
## 
## Number of Fisher Scoring iterations: 4

#and find the best model, given your data

From the summary of the first general logistic regression model it can be seen that significant relation provide settlement and interest in politics. For that reason I have tried model with only two such variables - it provides significant relation of predictors with the outcome, but its AIC is much higher.

# settlement and interest in politics only
model_2 <- glm(Q211 ~ settlement + intinpol, data = df, family="binomial"(link=logit)) 
summary(model_2) #AIC: 2371.6

## 
## Call:
## glm(formula = Q211 ~ settlement + intinpol, family = binomial(link = logit), 
##     data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6126  -1.0390  -0.7803   1.2656   1.8061  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                        0.9823
## settlementRegional center                                         -0.7572
## settlementDistrict center                                         -1.1291
## settlementAnother city, town (not a regional or district center)  -1.1371
## settlementVillage                                                 -0.8334
## intinpolSomewhat interested                                       -0.3538
## intinpolNot very interested                                       -0.4835
## intinpolNot at all interested                                     -1.2584
##                                                                  Std. Error
## (Intercept)                                                          0.2339
## settlementRegional center                                            0.1922
## settlementDistrict center                                            0.1939
## settlementAnother city, town (not a regional or district center)     0.3287
## settlementVillage                                                    0.1946
## intinpolSomewhat interested                                          0.1853
## intinpolNot very interested                                          0.1838
## intinpolNot at all interested                                        0.2144
##                                                                  z value
## (Intercept)                                                        4.199
## settlementRegional center                                         -3.939
## settlementDistrict center                                         -5.822
## settlementAnother city, town (not a regional or district center)  -3.459
## settlementVillage                                                 -4.283
## intinpolSomewhat interested                                       -1.909
## intinpolNot very interested                                       -2.631
## intinpolNot at all interested                                     -5.871
##                                                                  Pr(>|z|)
## (Intercept)                                                      2.68e-05
## settlementRegional center                                        8.19e-05
## settlementDistrict center                                        5.81e-09
## settlementAnother city, town (not a regional or district center) 0.000541
## settlementVillage                                                1.85e-05
## intinpolSomewhat interested                                      0.056267
## intinpolNot very interested                                      0.008517
## intinpolNot at all interested                                    4.34e-09
##                                                                     
## (Intercept)                                                      ***
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      .  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2440.9  on 1803  degrees of freedom
## Residual deviance: 2355.6  on 1796  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 2371.6
## 
## Number of Fisher Scoring iterations: 4

Then I have run model with the same two significant variables (settlement and interest in politics) and added age variable in one model and income in another. AIC is better in the model with income, although income is insignificant here, as well as age. However, for now I will take income variable into the model.

# age + settlement + intinpol
model_3_1 <- glm(Q211 ~ age + settlement + intinpol, data = df, family="binomial"(link=logit)) 
summary(model_3_1) #AIC: 2373.5

## 
## Call:
## glm(formula = Q211 ~ age + settlement + intinpol, family = binomial(link = logit), 
##     data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6198  -1.0444  -0.7783   1.2683   1.8098  
## 
## Coefficients:
##                                                                    Estimate
## (Intercept)                                                       0.9581686
## age                                                               0.0006893
## settlementRegional center                                        -0.7562321
## settlementDistrict center                                        -1.1300942
## settlementAnother city, town (not a regional or district center) -1.1311374
## settlementVillage                                                -0.8355760
## intinpolSomewhat interested                                      -0.3512133
## intinpolNot very interested                                      -0.4791006
## intinpolNot at all interested                                    -1.2532920
##                                                                  Std. Error
## (Intercept)                                                       0.2552441
## age                                                               0.0029240
## settlementRegional center                                         0.1923113
## settlementDistrict center                                         0.1940143
## settlementAnother city, town (not a regional or district center)  0.3296424
## settlementVillage                                                 0.1948539
## intinpolSomewhat interested                                       0.1856856
## intinpolNot very interested                                       0.1847034
## intinpolNot at all interested                                     0.2154344
##                                                                  z value
## (Intercept)                                                        3.754
## age                                                                0.236
## settlementRegional center                                         -3.932
## settlementDistrict center                                         -5.825
## settlementAnother city, town (not a regional or district center)  -3.431
## settlementVillage                                                 -4.288
## intinpolSomewhat interested                                       -1.891
## intinpolNot very interested                                       -2.594
## intinpolNot at all interested                                     -5.818
##                                                                  Pr(>|z|)
## (Intercept)                                                      0.000174
## age                                                              0.813632
## settlementRegional center                                        8.41e-05
## settlementDistrict center                                        5.72e-09
## settlementAnother city, town (not a regional or district center) 0.000600
## settlementVillage                                                1.80e-05
## intinpolSomewhat interested                                      0.058566
## intinpolNot very interested                                      0.009490
## intinpolNot at all interested                                    5.97e-09
##                                                                     
## (Intercept)                                                      ***
## age                                                                 
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      .  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2440.9  on 1803  degrees of freedom
## Residual deviance: 2355.5  on 1795  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 2373.5
## 
## Number of Fisher Scoring iterations: 4

# income + settlement + intinpol
model_3_2 <- glm(Q211 ~ income1 + settlement + intinpol, data = df, family="binomial"(link=logit)) 
summary(model_3_2) #AIC: 2299

## 
## Call:
## glm(formula = Q211 ~ income1 + settlement + intinpol, family = binomial(link = logit), 
##     data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7385  -1.0498  -0.8044   1.2668   1.8603  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                       1.26179
## income1                                                          -0.04819
## settlementRegional center                                        -0.75756
## settlementDistrict center                                        -1.13768
## settlementAnother city, town (not a regional or district center) -1.14404
## settlementVillage                                                -0.85176
## intinpolSomewhat interested                                      -0.42486
## intinpolNot very interested                                      -0.54943
## intinpolNot at all interested                                    -1.32210
##                                                                  Std. Error
## (Intercept)                                                         0.26340
## income1                                                             0.02642
## settlementRegional center                                           0.19453
## settlementDistrict center                                           0.19587
## settlementAnother city, town (not a regional or district center)    0.33594
## settlementVillage                                                   0.19767
## intinpolSomewhat interested                                         0.19018
## intinpolNot very interested                                         0.18854
## intinpolNot at all interested                                       0.21939
##                                                                  z value
## (Intercept)                                                        4.790
## income1                                                           -1.824
## settlementRegional center                                         -3.894
## settlementDistrict center                                         -5.808
## settlementAnother city, town (not a regional or district center)  -3.406
## settlementVillage                                                 -4.309
## intinpolSomewhat interested                                       -2.234
## intinpolNot very interested                                       -2.914
## intinpolNot at all interested                                     -6.026
##                                                                  Pr(>|z|)
## (Intercept)                                                      1.66e-06
## income1                                                           0.06815
## settlementRegional center                                        9.85e-05
## settlementDistrict center                                        6.31e-09
## settlementAnother city, town (not a regional or district center)  0.00066
## settlementVillage                                                1.64e-05
## intinpolSomewhat interested                                       0.02549
## intinpolNot very interested                                       0.00357
## intinpolNot at all interested                                    1.68e-09
##                                                                     
## (Intercept)                                                      ***
## income1                                                          .  
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      *  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2366.8  on 1742  degrees of freedom
## Residual deviance: 2281.0  on 1734  degrees of freedom
##   (67 observations deleted due to missingness)
## AIC: 2299
## 
## Number of Fisher Scoring iterations: 4

# age + income + settlement + intinpol
#model_3_3 <- glm(Q211 ~ income1 + settlement + intinpol + age, data = df, family="binomial"(link=logit)) 
#summary(model_3_3) #AIC: 2300, neither age, nor income gave here significant result, so I do not add this model into further comparison

Than I’ve decided which variables can I also try, to make the model better (because the smallest AIC is still provides the general model with lots of insignificant predictors, which I do not really like). I have added binary education variable. AIC is still bigger than from the full model, and education do not give significant result, BUT in this case income becomes significant predictor, which I do like.

# income + settlement + intinpol + edu
model_4 <- glm(Q211 ~ income1 + eduT + settlement + intinpol, data = df, family="binomial"(link=logit)) 
summary(model_4) #AIC: 2289

## 
## Call:
## glm(formula = Q211 ~ income1 + eduT + settlement + intinpol, 
##     family = binomial(link = logit), data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7889  -1.0431  -0.8065   1.2501   1.8931  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                       1.20299
## income1                                                          -0.05731
## eduT1                                                             0.17166
## settlementRegional center                                        -0.72626
## settlementDistrict center                                        -1.10399
## settlementAnother city, town (not a regional or district center) -1.11736
## settlementVillage                                                -0.80521
## intinpolSomewhat interested                                      -0.41660
## intinpolNot very interested                                      -0.55018
## intinpolNot at all interested                                    -1.30746
##                                                                  Std. Error
## (Intercept)                                                         0.26584
## income1                                                             0.02733
## eduT1                                                               0.11166
## settlementRegional center                                           0.19527
## settlementDistrict center                                           0.19716
## settlementAnother city, town (not a regional or district center)    0.33658
## settlementVillage                                                   0.20051
## intinpolSomewhat interested                                         0.19050
## intinpolNot very interested                                         0.18888
## intinpolNot at all interested                                       0.21972
##                                                                  z value
## (Intercept)                                                        4.525
## income1                                                           -2.097
## eduT1                                                              1.537
## settlementRegional center                                         -3.719
## settlementDistrict center                                         -5.599
## settlementAnother city, town (not a regional or district center)  -3.320
## settlementVillage                                                 -4.016
## intinpolSomewhat interested                                       -2.187
## intinpolNot very interested                                       -2.913
## intinpolNot at all interested                                     -5.951
##                                                                  Pr(>|z|)
## (Intercept)                                                      6.03e-06
## income1                                                          0.035981
## eduT1                                                            0.124219
## settlementRegional center                                        0.000200
## settlementDistrict center                                        2.15e-08
## settlementAnother city, town (not a regional or district center) 0.000901
## settlementVillage                                                5.92e-05
## intinpolSomewhat interested                                      0.028748
## intinpolNot very interested                                      0.003581
## intinpolNot at all interested                                    2.67e-09
##                                                                     
## (Intercept)                                                      ***
## income1                                                          *  
## eduT1                                                               
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      *  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2357.9  on 1735  degrees of freedom
## Residual deviance: 2269.9  on 1726  degrees of freedom
##   (74 observations deleted due to missingness)
## AIC: 2289.9
## 
## Number of Fisher Scoring iterations: 4

Best Model (AIC comparison)

I have compared all the models by AIC. The smallest AIC has the full model, however there only two predictors are significant. For that reason I have started to try other model and compare AIC of those. In the result I can say that I prefer the last model, where income, settlement and interest in politics are significant predictors and one insignificant predictor - education. Below there is the table with all models’ AIC scores. I have also added a table with predictors where asterisk shows if the predictor was significant in that model. Below there will be the model equation.

# the lower the better
#AIC(model_full) # 2283.4
#AIC(model_2) # 2371.6
#AIC(model_3_1) # 2373.5
#AIC(model_3_2) # 2299
#AIC(model_4) # 2289

model <- c("model_full", "model_2", "model_3_1", "model_3_2", "model_4")
predictors <- c("interest in politics* + settlement* + income + age + edu + region", "interest in politics* + settlement*", "interest in politics* + settlement* + age", "interest in politics* + settlement* + income", "interest in politics* + settlement* + income* + edu")
AIC_score <- c("2283.4", "2371.6", "2373.5", "2299", "2289")
AIC_scores <- data.frame(model, predictors, AIC_score)
library(knitr)
library(kableExtra)
kable(AIC_scores) %>%
  kable_styling(c("bordered"))%>% 
  add_header_above(c("Comparison of logistic regression models by AIC"=3)) %>% 
  row_spec(5, background = "lightsteelblue")

Comparison of logistic regression models by AIC
model	predictors	AIC_score
model_full	interest in politics* + settlement* + income + age + edu + region	2283.4
model_2	interest in politics* + settlement*	2371.6
model_3_1	interest in politics* + settlement* + age	2373.5
model_3_2	interest in politics* + settlement* + income	2299
model_4	interest in politics* + settlement* + income* + edu	2289

Model Equation

#install.packages("equatiomatic")
library(equatiomatic)
equatiomatic::extract_eq(model_4, use_coefs = TRUE)

\[ \log\left[ \frac { P( \operatorname{Q211} = \operatorname{1} ) }{ 1 - P( \operatorname{Q211} = \operatorname{1} ) } \right] = 1.2 - 0.06(\operatorname{income1}) + 0.17(\operatorname{eduT}_{\operatorname{1}}) - 0.73(\operatorname{settlement}_{\operatorname{Regional\ center}}) - 1.1(\operatorname{settlement}_{\operatorname{District\ center}}) - 1.12(\operatorname{settlement}_{\operatorname{Another\ city,\ town\ (not\ a\ regional\ or\ district\ center)}}) - 0.81(\operatorname{settlement}_{\operatorname{Village}}) - 0.42(\operatorname{intinpol}_{\operatorname{Somewhat\ interested}}) - 0.55(\operatorname{intinpol}_{\operatorname{Not\ very\ interested}}) - 1.31(\operatorname{intinpol}_{\operatorname{Not\ at\ all\ interested}}) + \epsilon \]

Interpretation

So, as we remember, the significant predictors here are interest in politics, settlement and income. According to the odds ratio (below) the following can be concluded.

exp(cbind(OR = coef(model_4), confint(model_4)))

##                                                                         OR
## (Intercept)                                                      3.3300552
## income1                                                          0.9443019
## eduT1                                                            1.1872700
## settlementRegional center                                        0.4837158
## settlementDistrict center                                        0.3315453
## settlementAnother city, town (not a regional or district center) 0.3271412
## settlementVillage                                                0.4469928
## intinpolSomewhat interested                                      0.6592834
## intinpolNot very interested                                      0.5768451
## intinpolNot at all interested                                    0.2705063
##                                                                      2.5 %
## (Intercept)                                                      1.9869893
## income1                                                          0.8948839
## eduT1                                                            0.9536530
## settlementRegional center                                        0.3284886
## settlementDistrict center                                        0.2242568
## settlementAnother city, town (not a regional or district center) 0.1666414
## settlementVillage                                                0.3004486
## intinpolSomewhat interested                                      0.4528583
## intinpolNot very interested                                      0.3974378
## intinpolNot at all interested                                    0.1751707
##                                                                     97.5 %
## (Intercept)                                                      5.6391781
## income1                                                          0.9961249
## eduT1                                                            1.4775889
## settlementRegional center                                        0.7070315
## settlementDistrict center                                        0.4862669
## settlementAnother city, town (not a regional or district center) 0.6265489
## settlementVillage                                                0.6600712
## intinpolSomewhat interested                                      0.9566525
## intinpolNot very interested                                      0.8342848
## intinpolNot at all interested                                    0.4148800

Income:

when income increases by one unit, the odds of attending lawful demonstrations change by a factor of 0.94 = they multiply by 0.94, or the chance of attanding the lawful demonstration decreases by 6 per cent;

Settlement:

when respondent is from regional center, their odds of attending lawful demonstrations change by a factor of 0.48 or decreases by 52%, as compared to the respondents from capital city;
when respondent is from district center, their odds of attending lawful demonstrations change by a factor of 0.33 or decreases by 67%, as compared to the respondents from capital city;
when respondent is from another city, their odds of attending lawful demonstrations change by a factor of 0.33 or decreases by 67%, as compared to the respondents from capital city;
when respondent is from village, their odds of attending lawful demonstrations change by a factor of 0.45 or decreases by 55%, as compared to the respondents from capital city;

Interest in politics:

when respondent is somewhat interested in politics, their odds of attending lawful demonstrations change by a factor of 0.66 or decreases by 34%, as compared to the respondents who are very interested in politics;
when respondent is not very interested in politics, their odds of attending lawful demonstrations change by a factor of 0.58 or decreases by 42%, as compared to those, who are very interested in politics;
when respondent is not interested in politics at all, their odds of attending lawful demonstrations change by a factor of 0.27 or decreases by 73%, as compared to those, who are very interested in politics.

Prediction

Below there is the calculation the predicted probability of participating in lawful demonstrations at each value of interest in politics, holding income at its mean. At this step I have decided to stop with two predictors only. From the graph below it can bу seen that the probability of participation in peaceful demonstrations is lower with hogher income of respondents, as well as decrease in respondent’s interest in politics give negative effect, too (respondent, who are not interested in politics are less likely to participate in demostrations and, vice versa, for respondents, who are interested in politics, probability to participate in lawful demonstrations is higher than of those, who are not interested.)

newdata1 <- with(df, data.frame(income1 = mean(income1, na.rm=TRUE), 
                        intinpol = c("Very interested", "Somewhat interested", "Not very interested", "Not at all interested")))
newdata1

model_new <- glm(Q211 ~ income1 + intinpol, data = df, family="binomial"(link=logit)) 
#summary(model_new)

newdata1$intinpolP <- predict(model_new, newdata = newdata1, type = "response")
#newdata1

newdata2 <- with(df, data.frame(income1 = rep(seq(from = 1, to = 8, length.out = 100), 4), 
                                intinpol = factor(rep(c("Very interested", "Somewhat interested", "Not very interested", "Not at all interested"), each = 100))))

newdata3 <- cbind(newdata2, predict(model_new, newdata = newdata2, type = "response", se = T))
#newdata3

newdata3 <- within(newdata3, {
  PredictedProb <- fit
  LL <- fit - (1.96 * se.fit)
  UL <- fit + (1.96 * se.fit)
})
#head(newdata3)

ggplot(newdata3, aes(x = income1, y = PredictedProb)) +
  geom_ribbon(aes(ymin = LL, ymax = UL, fill = intinpol), alpha = 0.2) +
  geom_line(aes(colour = intinpol), size = 0.75) +
  ylim(0, 1) +
  theme_bw()

Accuracy

Accuracy for the model, that I have chosen is 0.6, which is not really great, but still not bad. In other words, the percentage of correct predictions is 60.48%. This tells us that for the 1164 observations (respondent) used in the model, the model correctly predicted whether or not somebody attended lawful demonstration 60.48% of the time.

library(caret)
set.seed(3456)
trainIndex <- createDataPartition(df$Q211, p = .67,
                                  list = FALSE,
                                  times = 1)

Train <- df[ trainIndex,] %>% na.omit()
Test  <- df[trainIndex,] %>% na.omit()

#Train
#Test

model_train <- glm(Q211 ~ income1 + eduT + settlement + intinpol, data = Train, family=binomial)
#summary(model_train)
Test$model_prob <- predict(model_train, Test, type = "response")
Test <- Test  %>% mutate(model_pred = 1*(model_prob > .53) + 0,
                                 visit_binary = 1*(Q211 == "1") + 0)
Test <- Test %>% mutate(accurate = 1*(model_pred == visit_binary))
#str(Test)
sum(Test$accurate)/nrow(Test)

## [1] 0.6043103

Diagnostics

No additional predictors that are statistically significant;
Marginal model plots look nice;
GVIFs are ~1, which is good

library(car)
car::residualPlots(model_4)

##            Test stat Pr(>|Test stat|)
## income1       0.6031           0.4374
## eduT                                 
## settlement                           
## intinpol

car::marginalModelPlots(model_4)

vif(model_4)

##                GVIF Df GVIF^(1/(2*Df))
## income1    1.111424  1        1.054241
## eduT       1.113826  1        1.055380
## settlement 1.066610  4        1.008093
## intinpol   1.042529  3        1.006966

Data Imputation

Looking at missing data pattern:

df <- readRDS("ruwvs")
#install.packages("mice")
library(mice)
md.pattern(df)

##      Q211 townsize settlement region age intinpol eduR eduT income income1
## 1736    1        1          1      1   1        1    1    1      1       1
## 59      1        1          1      1   1        1    1    1      0       0
## 7       1        1          1      1   1        1    0    0      1       1
## 2       1        1          1      1   1        1    0    0      0       0
## 4       1        1          1      1   1        0    1    1      1       1
## 1       1        1          1      1   1        0    1    1      0       0
## 1       1        1          1      1   1        0    0    0      0       0
##         0        0          0      0   0        6   10   10     63      63
##         
## 1736   0
## 59     2
## 7      2
## 2      4
## 4      1
## 1      3
## 1      5
##      152

df_MICE <- mice(df,m=5,maxit=50,meth='pmm',seed=500)

summary(df_MICE)

## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##       Q211   intinpol       eduR   townsize settlement     income 
##         ""      "pmm"      "pmm"         ""         ""      "pmm" 
##     region        age    income1       eduT 
##         ""         ""         ""      "pmm" 
## PredictorMatrix:
##            Q211 intinpol eduR townsize settlement income region age
## Q211          0        1    1        1          1      1      1   1
## intinpol      1        0    1        1          1      1      1   1
## eduR          1        1    0        1          1      1      1   1
## townsize      1        1    1        0          1      1      1   1
## settlement    1        1    1        1          0      1      1   1
## income        1        1    1        1          1      0      1   1
##            income1 eduT
## Q211             0    1
## intinpol         0    1
## eduR             0    1
## townsize         0    1
## settlement       0    1
## income           0    1
## Number of logged events:  1223 
##   it im      dep      meth
## 1  0  0          collinear
## 2  1  1 intinpol       pmm
## 3  1  1     eduR       pmm
## 4  1  1   income       pmm
## 5  1  1   income       pmm
## 6  1  1     eduT       pmm
##                                                                                                                                                                                                                                                                                                                       out
## 1                                                                                                                                                                                                                                                                                                                 income1
## 2                                                                                                                                                                                                                                    age17, age35, age89, age92, age93, age94, age95, age96, age98, age99, age100, age103
## 3                                                                                                                                                                                                                                    age17, age40, age89, age92, age93, age94, age95, age96, age98, age99, age100, age103
## 4                                                                                                                                                                                                                             age17, age88, age89, age92, age93, age94, age95, age96, age98, age99, age100, age103, eduT1
## 5 * A ridge penalty had to be used to calculate the inverse crossproduct of the predictor matrix. Please remove duplicate variables or unique respondent names/numbers from the imputation model. It may be advisable to check the fraction of missing information (fmi) to evaluate the validity of the imputation model
## 6                                                                                                                                                                                                                      eduRTertiary, age17, age35, age89, age92, age93, age94, age95, age96, age98, age99, age100, age103

completedData <- complete(df_MICE,1)
xyplot(df_MICE,Q211 ~ income1 + eduT + settlement + intinpol,pch=18,cex=1)

stripplot(df_MICE, pch = 20, cex = 1.2)

The similar can be done with VIM also:

library(VIM)
aggr_plot <- aggr(df, col=c('cornflowerblue','sienna3'), numbers=TRUE, sortVars=TRUE, labels=names(data), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##    Variable       Count
##      income 0.034806630
##     income1 0.034806630
##        eduR 0.005524862
##        eduT 0.005524862
##    intinpol 0.003314917
##        Q211 0.000000000
##    townsize 0.000000000
##  settlement 0.000000000
##      region 0.000000000
##         age 0.000000000

df_kNN <- kNN(df, variable  = colnames(df), k = 6)
summary(df_kNN) # no NAs - great!

##  Q211                      intinpol               eduR    
##  0:1072   Very interested      :148   Primary       : 36  
##  1: 738   Somewhat interested  :639   Secondary     :470  
##           Not very interested  :715   Post-secondary:699  
##           Not at all interested:308   Tertiary      :605  
##                                                           
##                                                           
##                                                           
##             townsize  
##  Under 5,000    :391  
##  5000-20000     :203  
##  20000-100000   :284  
##  100000-500000  :342  
##  500000 and more:590  
##                       
##                       
##                                                   settlement 
##  Capital city                                          :149  
##  Regional center                                       :569  
##  District center                                       :542  
##  Another city, town (not a regional or district center): 58  
##  Village                                               :492  
##                                                              
##                                                              
##           income                                        region   
##  Fifth step  :395   RU: Central Federal District           :489  
##  Sixth step  :291   RU: Volga; Privolzhsky Federal District:382  
##  Fourth step :278   RU: Siberian Federal District          :239  
##  Third step  :266   RU: South Federal District             :183  
##  Seventh step:200   RU: North West Federal District        :177  
##  second step :136   RU: Urals Federal District             :156  
##  (Other)     :244   (Other)                                :184  
##       age          income1      eduT      Q211_imp       intinpol_imp   
##  35     :  47   Min.   :0.000   0:1205   Mode :logical   Mode :logical  
##  40     :  46   1st Qu.:2.000   1: 605   FALSE:1810      FALSE:1804     
##  27     :  44   Median :4.000                            TRUE :6        
##  33     :  44   Mean   :3.823                                           
##  36     :  44   3rd Qu.:5.000                                           
##  30     :  43   Max.   :9.000                                           
##  (Other):1542                                                           
##   eduR_imp       townsize_imp    settlement_imp  income_imp     
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:1800      FALSE:1810      FALSE:1810      FALSE:1747     
##  TRUE :10                                        TRUE :63       
##                                                                 
##                                                                 
##                                                                 
##                                                                 
##  region_imp       age_imp        income1_imp      eduT_imp      
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:1810      FALSE:1810      FALSE:1747      FALSE:1800     
##                                  TRUE :63        TRUE :10       
##                                                                 
##                                                                 
##                                                                 
##

After comparing models with and without impulation I may say that the results stay the same, the model has not been significantly improved after the imputation process.

summary(model_4) # with na.omit, AIC: 2289.9

## 
## Call:
## glm(formula = Q211 ~ income1 + eduT + settlement + intinpol, 
##     family = binomial(link = logit), data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7889  -1.0431  -0.8065   1.2501   1.8931  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                       1.20299
## income1                                                          -0.05731
## eduT1                                                             0.17166
## settlementRegional center                                        -0.72626
## settlementDistrict center                                        -1.10399
## settlementAnother city, town (not a regional or district center) -1.11736
## settlementVillage                                                -0.80521
## intinpolSomewhat interested                                      -0.41660
## intinpolNot very interested                                      -0.55018
## intinpolNot at all interested                                    -1.30746
##                                                                  Std. Error
## (Intercept)                                                         0.26584
## income1                                                             0.02733
## eduT1                                                               0.11166
## settlementRegional center                                           0.19527
## settlementDistrict center                                           0.19716
## settlementAnother city, town (not a regional or district center)    0.33658
## settlementVillage                                                   0.20051
## intinpolSomewhat interested                                         0.19050
## intinpolNot very interested                                         0.18888
## intinpolNot at all interested                                       0.21972
##                                                                  z value
## (Intercept)                                                        4.525
## income1                                                           -2.097
## eduT1                                                              1.537
## settlementRegional center                                         -3.719
## settlementDistrict center                                         -5.599
## settlementAnother city, town (not a regional or district center)  -3.320
## settlementVillage                                                 -4.016
## intinpolSomewhat interested                                       -2.187
## intinpolNot very interested                                       -2.913
## intinpolNot at all interested                                     -5.951
##                                                                  Pr(>|z|)
## (Intercept)                                                      6.03e-06
## income1                                                          0.035981
## eduT1                                                            0.124219
## settlementRegional center                                        0.000200
## settlementDistrict center                                        2.15e-08
## settlementAnother city, town (not a regional or district center) 0.000901
## settlementVillage                                                5.92e-05
## intinpolSomewhat interested                                      0.028748
## intinpolNot very interested                                      0.003581
## intinpolNot at all interested                                    2.67e-09
##                                                                     
## (Intercept)                                                      ***
## income1                                                          *  
## eduT1                                                               
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      *  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2357.9  on 1735  degrees of freedom
## Residual deviance: 2269.9  on 1726  degrees of freedom
##   (74 observations deleted due to missingness)
## AIC: 2289.9
## 
## Number of Fisher Scoring iterations: 4

model_4_knn <- glm(Q211 ~ income1 + eduT + settlement + intinpol, data = df_kNN, family="binomial"(link=logit)) 
summary(model_4_knn) #with kNN imputation - coefficients are the same, but AIC became larger (2373.2)

## 
## Call:
## glm(formula = Q211 ~ income1 + eduT + settlement + intinpol, 
##     family = binomial(link = logit), data = df_kNN)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7843  -1.0383  -0.7887   1.2587   1.9423  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                       1.21509
## income1                                                          -0.07292
## eduT1                                                             0.14909
## settlementRegional center                                        -0.77185
## settlementDistrict center                                        -1.12085
## settlementAnother city, town (not a regional or district center) -1.13235
## settlementVillage                                                -0.83998
## intinpolSomewhat interested                                      -0.34073
## intinpolNot very interested                                      -0.49130
## intinpolNot at all interested                                    -1.30567
##                                                                  Std. Error
## (Intercept)                                                         0.26283
## income1                                                             0.02697
## eduT1                                                               0.10998
## settlementRegional center                                           0.19368
## settlementDistrict center                                           0.19618
## settlementAnother city, town (not a regional or district center)    0.32855
## settlementVillage                                                   0.19904
## intinpolSomewhat interested                                         0.18627
## intinpolNot very interested                                         0.18459
## intinpolNot at all interested                                       0.21592
##                                                                  z value
## (Intercept)                                                        4.623
## income1                                                           -2.704
## eduT1                                                              1.356
## settlementRegional center                                         -3.985
## settlementDistrict center                                         -5.713
## settlementAnother city, town (not a regional or district center)  -3.447
## settlementVillage                                                 -4.220
## intinpolSomewhat interested                                       -1.829
## intinpolNot very interested                                       -2.662
## intinpolNot at all interested                                     -6.047
##                                                                  Pr(>|z|)
## (Intercept)                                                      3.78e-06
## income1                                                          0.006851
## eduT1                                                            0.175235
## settlementRegional center                                        6.74e-05
## settlementDistrict center                                        1.11e-08
## settlementAnother city, town (not a regional or district center) 0.000568
## settlementVillage                                                2.44e-05
## intinpolSomewhat interested                                      0.067355
## intinpolNot very interested                                      0.007777
## intinpolNot at all interested                                    1.47e-09
##                                                                     
## (Intercept)                                                      ***
## income1                                                          ** 
## eduT1                                                               
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      .  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2447.2  on 1809  degrees of freedom
## Residual deviance: 2352.8  on 1800  degrees of freedom
## AIC: 2372.8
## 
## Number of Fisher Scoring iterations: 4

model_4_mice <- glm(Q211 ~ income1 + eduT + settlement + intinpol, data = df_MICE$data, family="binomial"(link=logit)) 
summary(model_4_mice) #coefficient and AIC are the same as in the model without imputation

## 
## Call:
## glm(formula = Q211 ~ income1 + eduT + settlement + intinpol, 
##     family = binomial(link = logit), data = df_MICE$data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7889  -1.0431  -0.8065   1.2501   1.8931  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                       1.20299
## income1                                                          -0.05731
## eduT1                                                             0.17166
## settlementRegional center                                        -0.72626
## settlementDistrict center                                        -1.10399
## settlementAnother city, town (not a regional or district center) -1.11736
## settlementVillage                                                -0.80521
## intinpolSomewhat interested                                      -0.41660
## intinpolNot very interested                                      -0.55018
## intinpolNot at all interested                                    -1.30746
##                                                                  Std. Error
## (Intercept)                                                         0.26584
## income1                                                             0.02733
## eduT1                                                               0.11166
## settlementRegional center                                           0.19527
## settlementDistrict center                                           0.19716
## settlementAnother city, town (not a regional or district center)    0.33658
## settlementVillage                                                   0.20051
## intinpolSomewhat interested                                         0.19050
## intinpolNot very interested                                         0.18888
## intinpolNot at all interested                                       0.21972
##                                                                  z value
## (Intercept)                                                        4.525
## income1                                                           -2.097
## eduT1                                                              1.537
## settlementRegional center                                         -3.719
## settlementDistrict center                                         -5.599
## settlementAnother city, town (not a regional or district center)  -3.320
## settlementVillage                                                 -4.016
## intinpolSomewhat interested                                       -2.187
## intinpolNot very interested                                       -2.913
## intinpolNot at all interested                                     -5.951
##                                                                  Pr(>|z|)
## (Intercept)                                                      6.03e-06
## income1                                                          0.035981
## eduT1                                                            0.124219
## settlementRegional center                                        0.000200
## settlementDistrict center                                        2.15e-08
## settlementAnother city, town (not a regional or district center) 0.000901
## settlementVillage                                                5.92e-05
## intinpolSomewhat interested                                      0.028748
## intinpolNot very interested                                      0.003581
## intinpolNot at all interested                                    2.67e-09
##                                                                     
## (Intercept)                                                      ***
## income1                                                          *  
## eduT1                                                               
## settlementRegional center                                        ***
## settlementDistrict center                                        ***
## settlementAnother city, town (not a regional or district center) ***
## settlementVillage                                                ***
## intinpolSomewhat interested                                      *  
## intinpolNot very interested                                      ** 
## intinpolNot at all interested                                    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2357.9  on 1735  degrees of freedom
## Residual deviance: 2269.9  on 1726  degrees of freedom
##   (74 observations deleted due to missingness)
## AIC: 2289.9
## 
## Number of Fisher Scoring iterations: 4

project logistic regression

Anna Podolskaya

1/11/2020