Modeling Categorical Outcome in Stata


Giovanni Minchio
Yuxin Zhang


Quantitative Methods Lab, Lesson 8.1
19 Nov. 2024

Outline

Bonus:

Problem with LPMs

Unbounded Y prediction in linear regression:

Image source here

So, it is necessary to restrict the prediction value range between 0 and 1.

Logistic regression

Image source here

Logistic (Sigmoid) function: s-curve

For binary outcome, we want \(X \in \mathbb{R}\) and \(p(X) \in [0, 1]\), and logistic regression can “squeeze” the output to be between 0 and 1.

Image source here

\[ P(Y_i = 1 | X_i) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \dots + \beta_k X_{ik})}} \] \[ P(Y_i = 0 | X_i) = 1- \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \dots + \beta_k X_{ik})}} \]

E.g., probability of voter turnout:

\[ P(vote) = \frac{1}{1 + e^{-(\beta_0 + \beta_1age + \beta_2gender + \beta_3education + ...)}} \]

Key terms

Odds are defined as the ratio of the probability of an event occurring to the probability of it not occurring, where \(X \in [0, +\infty]\): \[odds = \frac{p}{1-p} = e^{a + bx}\]

Logit / log odds are logarithmic odds (logit function), where \(X \in [-\infty, +\infty]\). Logit is assumed to be linear, so in logistic regression, the log odds of the dependent variable are modeled as a linear combination of the independent variables and the intercept.

\[logit = ln(odds) = ln(\frac{p}{1-p}) = ln(e^{a + bx}) = a + bx\]

Probability can also be calculated, where \(X \in [0, 1]\):

\[P(Y = 1) = exp(logit) = \frac{odds}{1+odds} = exp(ln(e^{a + bx})) = \frac{e^{(a + bx)}}{1 + e^{(a + bx)}} = \frac{1}{1 + e^{-(a + bx)}} \]

1. Binary logistic regression

Let’s use lesson8.dta which can be downloaded from Moodle

cd ""
use "lesson8.dta", clear
  • first investigate variables to be used
tab1 work
-> tabulation of work  

    Working |
  condition |      Freq.     Percent        Cum.
------------+-----------------------------------
 Unemployed |      4,452       57.41       57.41
   Employed |      3,303       42.59      100.00
------------+-----------------------------------
      Total |      7,755      100.00
label list work
work:
           0 Unemployed
           1 Employed

First a simple LPM.

  • linear regression
reg work age, robust
Linear regression                               Number of obs     =      7,755
                                                F(1, 7753)        =   51791.44
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7198
                                                Root MSE          =     .26178

------------------------------------------------------------------------------
             |               Robust
        work | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0670447   .0002946   227.58   0.000     .0664672    .0676222
       _cons |  -1.171042   .0072912  -160.61   0.000    -1.185334   -1.156749
------------------------------------------------------------------------------
predict yhat
summarize yhat
(option xb assumed; fitted values)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        yhat |      9,009    .4258633     .393403  -.1653713   1.108478
  • simple plot
twoway (scatter work age) || (lfit work age)

  • plot observations and predicted line using yhat
twoway scatter yhat work age, connect(l i) msymbol(i) sort ylabel(0 1)

Now let’s try to fit a logit model.

  • logistic regression
logit work age 
Iteration 0:  Log likelihood = -5289.9229  
Iteration 1:  Log likelihood = -1573.3867  
Iteration 2:  Log likelihood =    -1458.7  
Iteration 3:  Log likelihood = -1456.7051  
Iteration 4:  Log likelihood = -1456.6995  
Iteration 5:  Log likelihood = -1456.6995  

Logistic regression                                    Number of obs =   7,755
                                                       LR chi2(1)    = 7666.45
                                                       Prob > chi2   =  0.0000
Log likelihood = -1456.6995                            Pseudo R2     =  0.7246

------------------------------------------------------------------------------
        work | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .6590841   .0154463    42.67   0.000       .62881    .6893582
       _cons |  -16.66854   .3943523   -42.27   0.000    -17.44145   -15.89562
------------------------------------------------------------------------------

We see Iteration 1, 2, 3, …, they indicate how quickly the model converged. Plus, log likelihood (-1456.6995) can be used in comparisons of nested models. Then, the likelihood ratio (LR) chi-square of 7666.45 with a p-value < .001 tells us that our model as a whole fits significantly better than an empty model (i.e., a model with no predictors).

A one-unit increase in the variable age is associated with a .66 increase in the relative log odds of being employed vs. unemployed, and it is statistically significant at a level of < 0.001.

predict yhat1
twoway scatter yhat1 work age, connect(l i) msymbol(i) sort ylabel(0 1)
(option pr assumed; Pr(work))

Interpreting outputs

Let’s go back to the same example from the LPM we saw last week.

cd ""
use "ESS10.dta", clear
tab1 vote agea gndr edulvlb cntry netustm
-> tabulation of vote  

 Voted last national |
            election |      Freq.     Percent        Cum.
---------------------+-----------------------------------
                 Yes |     26,794       72.12       72.12
                  No |      7,764       20.90       93.02
Not eligible to vote |      2,594        6.98      100.00
---------------------+-----------------------------------
               Total |     37,152      100.00

-> tabulation of agea  

       Age of |
  respondent, |
   calculated |      Freq.     Percent        Cum.
--------------+-----------------------------------
           15 |         99        0.27        0.27
           16 |        267        0.72        0.98
           17 |        360        0.96        1.95
           18 |        402        1.08        3.02
           19 |        458        1.23        4.25
           20 |        385        1.03        5.28
           21 |        457        1.22        6.51
           22 |        452        1.21        7.72
           23 |        400        1.07        8.79
           24 |        389        1.04        9.83
           25 |        419        1.12       10.95
           26 |        423        1.13       12.09
           27 |        401        1.07       13.16
           28 |        451        1.21       14.37
           29 |        448        1.20       15.57
           30 |        486        1.30       16.87
           31 |        492        1.32       18.19
           32 |        517        1.39       19.58
           33 |        536        1.44       21.01
           34 |        488        1.31       22.32
           35 |        555        1.49       23.81
           36 |        584        1.56       25.37
           37 |        550        1.47       26.85
           38 |        568        1.52       28.37
           39 |        566        1.52       29.89
           40 |        598        1.60       31.49
           41 |        680        1.82       33.31
           42 |        600        1.61       34.92
           43 |        593        1.59       36.51
           44 |        567        1.52       38.03
           45 |        598        1.60       39.63
           46 |        656        1.76       41.39
           47 |        609        1.63       43.02
           48 |        623        1.67       44.69
           49 |        649        1.74       46.43
           50 |        653        1.75       48.18
           51 |        687        1.84       50.02
           52 |        672        1.80       51.82
           53 |        662        1.77       53.59
           54 |        621        1.66       55.26
           55 |        653        1.75       57.01
           56 |        715        1.92       58.92
           57 |        664        1.78       60.70
           58 |        687        1.84       62.54
           59 |        660        1.77       64.31
           60 |        687        1.84       66.15
           61 |        728        1.95       68.10
           62 |        673        1.80       69.91
           63 |        634        1.70       71.60
           64 |        598        1.60       73.21
           65 |        630        1.69       74.89
           66 |        677        1.81       76.71
           67 |        657        1.76       78.47
           68 |        609        1.63       80.10
           69 |        587        1.57       81.67
           70 |        630        1.69       83.36
           71 |        630        1.69       85.05
           72 |        573        1.54       86.59
           73 |        521        1.40       87.98
           74 |        477        1.28       89.26
           75 |        506        1.36       90.62
           76 |        444        1.19       91.81
           77 |        393        1.05       92.86
           78 |        326        0.87       93.73
           79 |        310        0.83       94.56
           80 |        312        0.84       95.40
           81 |        307        0.82       96.22
           82 |        278        0.74       96.97
           83 |        213        0.57       97.54
           84 |        182        0.49       98.03
           85 |        159        0.43       98.45
           86 |        150        0.40       98.85
           87 |         98        0.26       99.12
           88 |         86        0.23       99.35
           89 |         93        0.25       99.60
           90 |        151        0.40      100.00
--------------+-----------------------------------
        Total |     37,319      100.00

-> tabulation of gndr  

     Gender |      Freq.     Percent        Cum.
------------+-----------------------------------
       Male |     17,463       46.43       46.43
     Female |     20,148       53.57      100.00
------------+-----------------------------------
      Total |     37,611      100.00

-> tabulation of edulvlb  

             Highest level of education |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
            Not completed ISCED level 1 |        317        0.85        0.85
   ISCED 1, completed primary education |      2,252        6.01        6.86
Vocational ISCED 2C < 2 years, no acces |          8        0.02        6.88
General/pre-vocational ISCED 2A/2B, acc |        223        0.60        7.48
General ISCED 2A, access ISCED 3A gener |      4,358       11.64       19.11
Vocational ISCED 2C >= 2 years, no acce |         44        0.12       19.23
Vocational ISCED 2A/2B, access ISCED 3  |        341        0.91       20.14
Vocational ISCED 2, access ISCED 3 gene |         44        0.12       20.26
Vocational ISCED 3C < 2 years, no acces |        508        1.36       21.62
General ISCED 3A/3B, access ISCED 5B/lo |         93        0.25       21.86
General ISCED 3A, access upper tier ISC |      5,304       14.16       36.03
Vocational ISCED 3C >= 2 years, no acce |      4,078       10.89       46.92
Vocational ISCED 3A, access ISCED 5B/ l |        666        1.78       48.69
Vocational ISCED 3A, access upper tier  |      5,386       14.38       63.08
General ISCED 4A/4B, access ISCED 5B/lo |         17        0.05       63.12
General ISCED 4A, access upper tier ISC |         19        0.05       63.17
ISCED 4 programmes without access ISCED |        836        2.23       65.40
Vocational ISCED 4A/4B, access ISCED 5B |         91        0.24       65.65
Vocational ISCED 4A, access upper tier  |        996        2.66       68.31
ISCED 5A short, intermediate/academic/g |        203        0.54       68.85
ISCED 5B short, advanced vocational qua |      1,626        4.34       73.19
ISCED 5A medium, bachelor/equivalent fr |      1,665        4.45       77.64
ISCED 5A medium, bachelor/equivalent fr |      3,133        8.37       86.00
ISCED 5A long, master/equivalent from l |        792        2.11       88.12
ISCED 5A long, master/equivalent from u |      3,961       10.58       98.69
               ISCED 6, doctoral degree |        408        1.09       99.78
                                  Other |         81        0.22      100.00
----------------------------------------+-----------------------------------
                                  Total |     37,450      100.00

-> tabulation of cntry  

    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
         BE |      1,341        3.57        3.57
         BG |      2,718        7.23       10.79
         CH |      1,523        4.05       14.84
         CZ |      2,476        6.58       21.42
         EE |      1,542        4.10       25.52
         FI |      1,577        4.19       29.72
         FR |      1,977        5.26       34.97
         GB |      1,149        3.05       38.03
         GR |      2,799        7.44       45.47
         HR |      1,592        4.23       49.70
         HU |      1,849        4.92       54.62
         IE |      1,770        4.71       59.33
         IS |        903        2.40       61.73
         IT |      2,640        7.02       68.75
         LT |      1,659        4.41       73.16
         ME |      1,278        3.40       76.55
         MK |      1,429        3.80       80.35
         NL |      1,470        3.91       84.26
         NO |      1,411        3.75       88.01
         PT |      1,838        4.89       92.90
         SI |      1,252        3.33       96.23
         SK |      1,418        3.77      100.00
------------+-----------------------------------
      Total |     37,611      100.00

-> tabulation of netustm  

 Internet use, |
 how much time |
    on typical |
       day, in |
       minutes |      Freq.     Percent        Cum.
---------------+-----------------------------------
             0 |         43        0.16        0.16
             1 |          7        0.03        0.18
             2 |          7        0.03        0.21
             3 |          1        0.00        0.21
             5 |         21        0.08        0.29
             6 |         62        0.22        0.51
             7 |         24        0.09        0.60
             8 |         72        0.26        0.86
             9 |         28        0.10        0.96
            10 |        142        0.51        1.47
            14 |          1        0.00        1.48
            15 |        167        0.61        2.08
            18 |          1        0.00        2.09
            20 |        126        0.46        2.54
            25 |         10        0.04        2.58
            28 |          1        0.00        2.58
            30 |      1,034        3.75        6.33
            31 |          1        0.00        6.33
            35 |          8        0.03        6.36
            38 |          2        0.01        6.37
            40 |         61        0.22        6.59
            45 |        229        0.83        7.42
            50 |         52        0.19        7.61
            55 |          8        0.03        7.64
            59 |          2        0.01        7.65
            60 |      3,401       12.32       19.97
            61 |          6        0.02       19.99
            63 |          2        0.01       20.00
            64 |          2        0.01       20.01
            65 |         18        0.07       20.07
            68 |          4        0.01       20.08
            69 |          1        0.00       20.09
            70 |         55        0.20       20.29
            71 |          2        0.01       20.29
            72 |          1        0.00       20.30
            74 |          1        0.00       20.30
            75 |         65        0.24       20.54
            78 |          1        0.00       20.54
            80 |         80        0.29       20.83
            85 |         10        0.04       20.87
            88 |          2        0.01       20.87
            90 |      1,393        5.05       25.92
            95 |          7        0.03       25.95
            98 |          2        0.01       25.95
            99 |          1        0.00       25.96
           100 |         20        0.07       26.03
           105 |         49        0.18       26.21
           110 |         43        0.16       26.36
           115 |          4        0.01       26.38
           118 |          1        0.00       26.38
           119 |          5        0.02       26.40
           120 |      4,432       16.06       42.46
           121 |          3        0.01       42.47
           122 |          6        0.02       42.49
           123 |          5        0.02       42.51
           125 |          5        0.02       42.53
           128 |          9        0.03       42.56
           130 |         34        0.12       42.68
           132 |          1        0.00       42.69
           133 |          1        0.00       42.69
           135 |         43        0.16       42.85
           138 |          1        0.00       42.85
           140 |         50        0.18       43.03
           143 |          1        0.00       43.04
           145 |          1        0.00       43.04
           150 |      1,106        4.01       47.05
           155 |          3        0.01       47.06
           158 |          4        0.01       47.07
           160 |         24        0.09       47.16
           165 |         29        0.11       47.26
           168 |          1        0.00       47.27
           170 |         13        0.05       47.32
           175 |          1        0.00       47.32
           177 |          1        0.00       47.32
           180 |      3,147       11.40       58.73
           181 |          3        0.01       58.74
           182 |          2        0.01       58.74
           183 |          5        0.02       58.76
           185 |         14        0.05       58.81
           188 |          2        0.01       58.82
           189 |          1        0.00       58.82
           190 |         24        0.09       58.91
           192 |          2        0.01       58.92
           195 |         17        0.06       58.98
           196 |          1        0.00       58.98
           198 |          1        0.00       58.99
           200 |         32        0.12       59.10
           202 |          1        0.00       59.11
           204 |          1        0.00       59.11
           205 |          5        0.02       59.13
           210 |        537        1.95       61.07
           215 |          3        0.01       61.08
           220 |          3        0.01       61.10
           225 |         12        0.04       61.14
           230 |         21        0.08       61.21
           240 |      2,207        8.00       69.21
           242 |          1        0.00       69.22
           243 |          1        0.00       69.22
           244 |          3        0.01       69.23
           245 |          1        0.00       69.23
           246 |          1        0.00       69.24
           248 |          1        0.00       69.24
           250 |          9        0.03       69.27
           255 |          8        0.03       69.30
           256 |          1        0.00       69.31
           260 |         21        0.08       69.38
           264 |          1        0.00       69.39
           265 |          4        0.01       69.40
           270 |        353        1.28       70.68
           271 |          1        0.00       70.68
           275 |          2        0.01       70.69
           276 |          1        0.00       70.69
           278 |          1        0.00       70.70
           280 |         10        0.04       70.73
           285 |          6        0.02       70.76
           290 |          4        0.01       70.77
           299 |          1        0.00       70.77
           300 |      1,948        7.06       77.83
           301 |          4        0.01       77.85
           302 |          1        0.00       77.85
           304 |          1        0.00       77.85
           305 |          3        0.01       77.86
           308 |          3        0.01       77.88
           310 |         11        0.04       77.92
           311 |          1        0.00       77.92
           315 |          6        0.02       77.94
           320 |         12        0.04       77.98
           325 |          2        0.01       77.99
           328 |          1        0.00       77.99
           330 |        234        0.85       78.84
           333 |          1        0.00       78.85
           338 |          1        0.00       78.85
           340 |          4        0.01       78.86
           345 |          3        0.01       78.88
           350 |         11        0.04       78.92
           359 |          1        0.00       78.92
           360 |      1,205        4.37       83.29
           361 |          2        0.01       83.29
           362 |          1        0.00       83.30
           363 |          1        0.00       83.30
           365 |          2        0.01       83.31
           368 |          2        0.01       83.31
           370 |          5        0.02       83.33
           375 |          4        0.01       83.35
           377 |          1        0.00       83.35
           380 |          7        0.03       83.38
           390 |        126        0.46       83.83
           400 |          3        0.01       83.84
           405 |          6        0.02       83.86
           410 |          6        0.02       83.89
           420 |        482        1.75       85.63
           425 |          4        0.01       85.65
           430 |          4        0.01       85.66
           435 |          2        0.01       85.67
           440 |          2        0.01       85.68
           445 |          1        0.00       85.68
           450 |         62        0.22       85.90
           460 |          1        0.00       85.91
           470 |          2        0.01       85.92
           480 |      1,315        4.76       90.68
           481 |          1        0.00       90.68
           485 |          1        0.00       90.69
           488 |          6        0.02       90.71
           489 |          1        0.00       90.71
           490 |          5        0.02       90.73
           492 |          1        0.00       90.73
           495 |          3        0.01       90.75
           500 |          5        0.02       90.76
           505 |          1        0.00       90.77
           510 |         95        0.34       91.11
           520 |          4        0.01       91.13
           525 |          2        0.01       91.13
           530 |          2        0.01       91.14
           533 |          1        0.00       91.14
           540 |        406        1.47       92.62
           545 |          1        0.00       92.62
           555 |          1        0.00       92.62
           560 |          1        0.00       92.63
           570 |         53        0.19       92.82
           580 |          1        0.00       92.82
           585 |          1        0.00       92.83
           590 |          4        0.01       92.84
           595 |          6        0.02       92.86
           599 |          2        0.01       92.87
           600 |      1,126        4.08       96.95
           601 |          1        0.00       96.95
           602 |          1        0.00       96.96
           608 |          1        0.00       96.96
           609 |          1        0.00       96.96
           610 |          3        0.01       96.97
           615 |          2        0.01       96.98
           620 |          4        0.01       97.00
           630 |         29        0.11       97.10
           640 |          1        0.00       97.10
           650 |          2        0.01       97.11
           660 |        108        0.39       97.50
           665 |          1        0.00       97.51
           690 |          5        0.02       97.53
           720 |        428        1.55       99.08
           732 |          1        0.00       99.08
           735 |          1        0.00       99.08
           740 |          2        0.01       99.09
           745 |          1        0.00       99.09
           750 |         13        0.05       99.14
           765 |          1        0.00       99.14
           780 |         31        0.11       99.26
           810 |          1        0.00       99.26
           840 |         58        0.21       99.47
           870 |          1        0.00       99.47
           899 |          2        0.01       99.48
           900 |         60        0.22       99.70
           930 |          2        0.01       99.71
           940 |          1        0.00       99.71
           960 |         37        0.13       99.84
           990 |          1        0.00       99.85
          1020 |          5        0.02       99.87
          1038 |          1        0.00       99.87
          1080 |         13        0.05       99.92
          1140 |          1        0.00       99.92
          1200 |         12        0.04       99.96
          1380 |          3        0.01       99.97
          1440 |          7        0.03      100.00
---------------+-----------------------------------
         Total |     27,598      100.00

Recode variables

  • we keep countries GB: United Kingdom, NO: Norway, FR: France, IT: Italy
keep if cntry == "GB" | cntry == "NO" | cntry == "FR" | cntry == "IT" 
tab cntry
(30,434 observations deleted)

    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
         FR |      1,977       27.55       27.55
         GB |      1,149       16.01       43.56
         IT |      2,640       36.78       80.34
         NO |      1,411       19.66      100.00
------------+-----------------------------------
      Total |      7,177      100.00
  • recode country variable to numeric for later regression, with new label starting from 0
label def cntry_4 0 "NO" 1 "IT" 2 "FR" 3"GB"
encode cntry, gen(cntry_4) label(cntry_4)
  • check
tab cntry_4
label list cntry_4
    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
         NO |      1,411       19.66       19.66
         IT |      2,640       36.78       56.44
         FR |      1,977       27.55       83.99
         GB |      1,149       16.01      100.00
------------+-----------------------------------
      Total |      7,177      100.00

cntry_4:
           0 NO
           1 IT
           2 FR
           3 GB
  • vote
label list vote
drop if vote == 3
recode vote (1 = 1) (2 = 0), gen(vote_bi)
tab vote_bi
vote:
           1 Yes
           2 No
           3 Not eligible to vote
          .a Refusal
          .b Don't know
          .c No answer

(742 observations deleted)

(1,480 differences between vote and vote_bi)


  RECODE of |
vote (Voted |
       last |
   national |
  election) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,480       23.47       23.47
          1 |      4,825       76.53      100.00
------------+-----------------------------------
      Total |      6,305      100.00
  • age range
keep if agea >= 25 & agea <= 60
(2,725 observations deleted)
  • recode gender, so 0 = male, 1 = female
label list gndr
recode gndr (1 = 0) (2 = 1), gen(gndr_bi)
tab gndr_bi
gndr:
           1 Male
           2 Female
          .a No answer

(3,710 differences between gndr and gndr_bi)


  RECODE of |
       gndr |
   (Gender) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,797       48.44       48.44
          1 |      1,913       51.56      100.00
------------+-----------------------------------
      Total |      3,710      100.00
  • recode education, so 0 = no tertiary, 1 = tertiary
label list edulvlb
recode edulvlb (0/520 = 0) (610/888 = 1) (5555 = .), gen(edu_bi)
edulvlb:
           0 Not completed ISCED level 1
         113 ISCED 1, completed primary education
         129 Vocational ISCED 2C < 2 years, no access ISCED 3
         212 General/pre-vocational ISCED 2A/2B, access ISCED 3 vocational
         213 General ISCED 2A, access ISCED 3A general/all 3
         221 Vocational ISCED 2C >= 2 years, no access ISCED 3
         222 Vocational ISCED 2A/2B, access ISCED 3 vocational
         223 Vocational ISCED 2, access ISCED 3 general/all
         229 Vocational ISCED 3C < 2 years, no access ISCED 5
         311 General ISCED 3 >=2 years, no access ISCED 5
         312 General ISCED 3A/3B, access ISCED 5B/lower tier 5A
         313 General ISCED 3A, access upper tier ISCED 5A/all 5
         321 Vocational ISCED 3C >= 2 years, no access ISCED 5
         322 Vocational ISCED 3A, access ISCED 5B/ lower tier 5A
         323 Vocational ISCED 3A, access upper tier ISCED 5A/all 5
         412 General ISCED 4A/4B, access ISCED 5B/lower tier 5A
         413 General ISCED 4A, access upper tier ISCED 5A/all 5
         421 ISCED 4 programmes without access ISCED 5
         422 Vocational ISCED 4A/4B, access ISCED 5B/lower tier 5A
         423 Vocational ISCED 4A, access upper tier ISCED 5A/all 5
         510 ISCED 5A short, intermediate/academic/general tertiary below bachelor
         520 ISCED 5B short, advanced vocational qualifications
         610 ISCED 5A medium, bachelor/equivalent from lower tier tertiary
         620 ISCED 5A medium, bachelor/equivalent from upper/single tier tertiary
         710 ISCED 5A long, master/equivalent from lower tier tertiary
         720 ISCED 5A long, master/equivalent from upper/single tier tertiary
         800 ISCED 6, doctoral degree
        5555 Other
          .a Refusal
          .b Don't know
          .c No answer

(3,678 differences between edulvlb and edu_bi)

Log odds: beta

  • fit the logit model
logit vote_bi agea i.edu_bi netustm i.gndr_bi i.cntry_4
Iteration 0:  Log likelihood = -1780.0819  
Iteration 1:  Log likelihood = -1553.2508  
Iteration 2:  Log likelihood = -1538.6147  
Iteration 3:  Log likelihood = -1538.5512  
Iteration 4:  Log likelihood = -1538.5512  

Logistic regression                                     Number of obs =  3,279
                                                        LR chi2(7)    = 483.06
                                                        Prob > chi2   = 0.0000
Log likelihood = -1538.5512                             Pseudo R2     = 0.1357

------------------------------------------------------------------------------
     vote_bi | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   .0529852   .0046327    11.44   0.000     .0439052    .0620651
    1.edu_bi |   .8685538   .1041379     8.34   0.000     .6644474     1.07266
     netustm |    .000117   .0002564     0.46   0.648    -.0003855    .0006195
   1.gndr_bi |   .0008793   .0899158     0.01   0.992    -.1753525    .1771111
             |
     cntry_4 |
         IT  |  -.6177332   .1549437    -3.99   0.000    -.9214172   -.3140492
         FR  |  -1.961983   .1484614   -13.22   0.000    -2.252962   -1.671004
         GB  |  -1.082758   .1690285    -6.41   0.000    -1.414048   -.7514684
             |
       _cons |  -.3336569   .2559958    -1.30   0.192    -.8353994    .1680856
------------------------------------------------------------------------------

E.g., beta(agea) = .05 means that one unit increase in age is related to a 0.05 increase in the log odds of voting, holding all other variables constant.

You can use display exp(beta) to use Stata as the calculator and check the odds ratios.

Odds ratios: exp(beta)

  • use the option or (odds ratios) with logit command
logit vote_bi agea i.edu_bi netustm i.gndr_bi i.cntry_4, or
Iteration 0:  Log likelihood = -1780.0819  
Iteration 1:  Log likelihood = -1553.2508  
Iteration 2:  Log likelihood = -1538.6147  
Iteration 3:  Log likelihood = -1538.5512  
Iteration 4:  Log likelihood = -1538.5512  

Logistic regression                                     Number of obs =  3,279
                                                        LR chi2(7)    = 483.06
                                                        Prob > chi2   = 0.0000
Log likelihood = -1538.5512                             Pseudo R2     = 0.1357

------------------------------------------------------------------------------
     vote_bi | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   1.054414   .0048848    11.44   0.000     1.044883    1.064032
    1.edu_bi |   2.383462   .2482086     8.34   0.000     1.943416    2.923146
     netustm |   1.000117   .0002564     0.46   0.648     .9996146     1.00062
   1.gndr_bi |    1.00088   .0899949     0.01   0.992     .8391612    1.193764
             |
     cntry_4 |
         IT  |   .5391652   .0835402    -3.99   0.000     .3979547    .7304831
         FR  |   .1405794   .0208706   -13.22   0.000     .1050875    .1880582
         GB  |   .3386602   .0572432    -6.41   0.000      .243157    .4716735
             |
       _cons |   .7162995   .1833697    -1.30   0.192     .4337012    1.183038
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
  • or use logistic command
logistic vote_bi agea i.edu_bi netustm i.gndr_bi i.cntry_4
Logistic regression                                     Number of obs =  3,279
                                                        LR chi2(7)    = 483.06
                                                        Prob > chi2   = 0.0000
Log likelihood = -1538.5512                             Pseudo R2     = 0.1357

------------------------------------------------------------------------------
     vote_bi | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   1.054414   .0048848    11.44   0.000     1.044883    1.064032
    1.edu_bi |   2.383462   .2482086     8.34   0.000     1.943416    2.923146
     netustm |   1.000117   .0002564     0.46   0.648     .9996146     1.00062
   1.gndr_bi |    1.00088   .0899949     0.01   0.992     .8391612    1.193764
             |
     cntry_4 |
         IT  |   .5391652   .0835402    -3.99   0.000     .3979547    .7304831
         FR  |   .1405794   .0208706   -13.22   0.000     .1050875    .1880582
         GB  |   .3386602   .0572432    -6.41   0.000      .243157    .4716735
             |
       _cons |   .7162995   .1833697    -1.30   0.192     .4337012    1.183038
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

odds(if the variable is incremented by 1 unit)/odds(if variable stays at base)

Probabilities

logit vote_bi agea i.edu_bi netustm i.gndr_bi i.cntry_4
Iteration 0:  Log likelihood = -1780.0819  
Iteration 1:  Log likelihood = -1553.2508  
Iteration 2:  Log likelihood = -1538.6147  
Iteration 3:  Log likelihood = -1538.5512  
Iteration 4:  Log likelihood = -1538.5512  

Logistic regression                                     Number of obs =  3,279
                                                        LR chi2(7)    = 483.06
                                                        Prob > chi2   = 0.0000
Log likelihood = -1538.5512                             Pseudo R2     = 0.1357

------------------------------------------------------------------------------
     vote_bi | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   .0529852   .0046327    11.44   0.000     .0439052    .0620651
    1.edu_bi |   .8685538   .1041379     8.34   0.000     .6644474     1.07266
     netustm |    .000117   .0002564     0.46   0.648    -.0003855    .0006195
   1.gndr_bi |   .0008793   .0899158     0.01   0.992    -.1753525    .1771111
             |
     cntry_4 |
         IT  |  -.6177332   .1549437    -3.99   0.000    -.9214172   -.3140492
         FR  |  -1.961983   .1484614   -13.22   0.000    -2.252962   -1.671004
         GB  |  -1.082758   .1690285    -6.41   0.000    -1.414048   -.7514684
             |
       _cons |  -.3336569   .2559958    -1.30   0.192    -.8353994    .1680856
------------------------------------------------------------------------------

With margins

  • predicted probabilities of the outcome
margins edu_bi
margins gndr_bi
margins cntry_4
Predictive margins                                       Number of obs = 3,279
Model VCE: OIM

Expression: Pr(vote_bi), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      edu_bi |
          0  |   .7202895   .0093642    76.92   0.000     .7019361    .7386429
          1  |   .8465596   .0100391    84.33   0.000     .8268833     .866236
------------------------------------------------------------------------------


Predictive margins                                       Number of obs = 3,279
Model VCE: OIM

Expression: Pr(vote_bi), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     gndr_bi |
          0  |    .766934   .0097441    78.71   0.000     .7478359     .786032
          1  |   .7670676   .0095499    80.32   0.000     .7483502    .7857851
------------------------------------------------------------------------------


Predictive margins                                       Number of obs = 3,279
Model VCE: OIM

Expression: Pr(vote_bi), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     cntry_4 |
         NO  |   .8984267    .011336    79.25   0.000     .8762087    .9206448
         IT  |   .8307957   .0108654    76.46   0.000        .8095    .8520914
         FR  |    .586057   .0157509    37.21   0.000     .5551857    .6169282
         GB  |   .7605729   .0189654    40.10   0.000     .7234013    .7977444
------------------------------------------------------------------------------
  • average marginal effects (AMEs)
margins, dydx(agea)
margins, dydx(netustm)
margins, dydx(cntry_4)
Average marginal effects                                 Number of obs = 3,279
Model VCE: OIM

Expression: Pr(vote_bi), predict()
dy/dx wrt:  agea

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   .0080538   .0006605    12.19   0.000     .0067592    .0093483
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 3,279
Model VCE: OIM

Expression: Pr(vote_bi), predict()
dy/dx wrt:  netustm

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     netustm |   .0000178    .000039     0.46   0.648    -.0000586    .0000941
------------------------------------------------------------------------------


Average marginal effects                                 Number of obs = 3,279
Model VCE: OIM

Expression: Pr(vote_bi), predict()
dy/dx wrt:  1.cntry_4 2.cntry_4 3.cntry_4

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     cntry_4 |
         IT  |   -.067631   .0159567    -4.24   0.000    -.0989056   -.0363565
         FR  |  -.3123698    .019558   -15.97   0.000    -.3507028   -.2740368
         GB  |  -.1378539   .0218305    -6.31   0.000    -.1806409   -.0950669
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Compare to the LPM

regress vote_bi agea i.edu_bi netustm i.gndr_bi i.cntry_4, robust 
Linear regression                               Number of obs     =      3,279
                                                F(7, 3271)        =      77.11
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1412
                                                Root MSE          =     .39223

------------------------------------------------------------------------------
             |               Robust
     vote_bi | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   .0082132   .0007075    11.61   0.000      .006826    .0096003
    1.edu_bi |   .1274084   .0143858     8.86   0.000     .0992024    .1556144
     netustm |   .0000223   .0000386     0.58   0.563    -.0000534    .0000981
   1.gndr_bi |  -.0016086   .0137437    -0.12   0.907    -.0285558    .0253385
             |
     cntry_4 |
         IT  |  -.0549815   .0162725    -3.38   0.001    -.0868868   -.0230762
         FR  |  -.3062807   .0194974   -15.71   0.000     -.344509   -.2680524
         GB  |  -.1251441   .0203418    -6.15   0.000    -.1650281   -.0852602
             |
       _cons |   .4760632   .0397109    11.99   0.000     .3982024    .5539239
------------------------------------------------------------------------------

Seems the LPM was efficient enough.

2. Multinomial logistic regression

For unordered categorical outcomes.

Let’s say we are interested in predicting party preference prtcleit in Italy using a set of covariates.

Recode variables

  • keep Italy
keep if cntry == "IT"
(2,368 observations deleted)
  • check the outcome variable
tab prtcleit
   Which party feel closer to, Italy |      Freq.     Percent        Cum.
-------------------------------------+-----------------------------------
                  Movimento 5 Stelle |         64       21.05       21.05
                 Partido Democratico |         88       28.95       50.00
                                Lega |         37       12.17       62.17
                        Forza Italia |         30        9.87       72.04
Fratelli d'Italia con Giorgia Meloni |         52       17.11       89.14
               Liberi e Uguali (LEU) |          5        1.64       90.79
                            + Europa |          2        0.66       91.45
              Noi con l'Italia - UDC |          4        1.32       92.76
                    Potere al popolo |          6        1.97       94.74
                            SVP-PATT |          6        1.97       96.71
                               Altro |          2        0.66       97.37
                   Partito Comunista |          3        0.99       98.36
                  Partito Socialista |          1        0.33       98.68
                            Italexit |          2        0.66       99.34
                   Azione di Calenda |          2        0.66      100.00
-------------------------------------+-----------------------------------
                               Total |        304      100.00
  • drop small categories
keep if prtcleit < 6
tab prtcleit
(1,071 observations deleted)


   Which party feel closer to, Italy |      Freq.     Percent        Cum.
-------------------------------------+-----------------------------------
                  Movimento 5 Stelle |         64       23.62       23.62
                 Partido Democratico |         88       32.47       56.09
                                Lega |         37       13.65       69.74
                        Forza Italia |         30       11.07       80.81
Fratelli d'Italia con Giorgia Meloni |         52       19.19      100.00
-------------------------------------+-----------------------------------
                               Total |        271      100.00
  • check predictors
tab1 agea gndr_bi edu_bi
-> tabulation of agea  

       Age of |
  respondent, |
   calculated |      Freq.     Percent        Cum.
--------------+-----------------------------------
           25 |          6        2.21        2.21
           26 |          3        1.11        3.32
           27 |          9        3.32        6.64
           28 |          5        1.85        8.49
           29 |          6        2.21       10.70
           30 |          3        1.11       11.81
           31 |          5        1.85       13.65
           32 |          4        1.48       15.13
           33 |          3        1.11       16.24
           34 |          5        1.85       18.08
           35 |          8        2.95       21.03
           36 |          9        3.32       24.35
           37 |          6        2.21       26.57
           38 |          5        1.85       28.41
           39 |          4        1.48       29.89
           40 |          4        1.48       31.37
           41 |         10        3.69       35.06
           42 |          6        2.21       37.27
           43 |          7        2.58       39.85
           44 |          5        1.85       41.70
           45 |          9        3.32       45.02
           46 |          8        2.95       47.97
           47 |         12        4.43       52.40
           48 |         13        4.80       57.20
           49 |          9        3.32       60.52
           50 |          9        3.32       63.84
           51 |          8        2.95       66.79
           52 |         12        4.43       71.22
           53 |          3        1.11       72.32
           54 |         15        5.54       77.86
           55 |          3        1.11       78.97
           56 |         14        5.17       84.13
           57 |         10        3.69       87.82
           58 |         11        4.06       91.88
           59 |         10        3.69       95.57
           60 |         12        4.43      100.00
--------------+-----------------------------------
        Total |        271      100.00

-> tabulation of gndr_bi  

  RECODE of |
       gndr |
   (Gender) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        134       49.45       49.45
          1 |        137       50.55      100.00
------------+-----------------------------------
      Total |        271      100.00

-> tabulation of edu_bi  

  RECODE of |
    edulvlb |
   (Highest |
   level of |
 education) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        193       72.28       72.28
          1 |         74       27.72      100.00
------------+-----------------------------------
      Total |        267      100.00

Log odds

  • fit mlogit
mlogit prtcleit agea i.gndr_bi i.edu_bi
Iteration 0:  Log likelihood = -411.73463  
Iteration 1:  Log likelihood = -400.95048  
Iteration 2:  Log likelihood = -400.57344  
Iteration 3:  Log likelihood = -400.57044  
Iteration 4:  Log likelihood = -400.57044  

Multinomial logistic regression                         Number of obs =    267
                                                        LR chi2(12)   =  22.33
                                                        Prob > chi2   = 0.0340
Log likelihood = -400.57044                             Pseudo R2     = 0.0271

--------------------------------------------------------------------------------------------------
                        prtcleit | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------------------+----------------------------------------------------------------
Movimento_5_Stelle               |
                            agea |  -.0166718   .0168768    -0.99   0.323    -.0497497    .0164061
                       1.gndr_bi |  -.3622246   .3402236    -1.06   0.287    -1.029051    .3046013
                        1.edu_bi |  -.8921583   .3913603    -2.28   0.023     -1.65921   -.1251063
                           _cons |   .8451113   .8128062     1.04   0.298    -.7479597    2.438182
---------------------------------+----------------------------------------------------------------
Partido_Democratico              |  (base outcome)
---------------------------------+----------------------------------------------------------------
Lega                             |
                            agea |   .0157313   .0209567     0.75   0.453    -.0253431    .0568056
                       1.gndr_bi |   .0304104   .4043588     0.08   0.940    -.7621182     .822939
                        1.edu_bi |  -1.896445   .6472133    -2.93   0.003     -3.16496   -.6279307
                           _cons |  -1.196412   1.042321    -1.15   0.251    -3.239324    .8464987
---------------------------------+----------------------------------------------------------------
Forza_Italia                     |
                            agea |  -.0117892   .0211426    -0.56   0.577     -.053228    .0296496
                       1.gndr_bi |  -.1336221   .4262747    -0.31   0.754    -.9691051     .701861
                        1.edu_bi |  -.2554771   .4555397    -0.56   0.575    -1.148319    .6373643
                           _cons |   -.372004   1.022091    -0.36   0.716    -2.375266    1.631258
---------------------------------+----------------------------------------------------------------
Fratelli_d_Italia_con_Giorgia_Me |
                            agea |   .0224098   .0183421     1.22   0.222    -.0135402    .0583597
                       1.gndr_bi |   .0633409   .3557715     0.18   0.859    -.6339585    .7606403
                        1.edu_bi |  -.3498695   .3860904    -0.91   0.365    -1.106593    .4068539
                           _cons |  -1.448272   .9203001    -1.57   0.116    -3.252027    .3554831
--------------------------------------------------------------------------------------------------

For each one-unit increase in age, the log odds of preferring “Movimento 5 Stelle” over “Partido Democratico” would decrease by approximately 0.017, holding other variables constant.

Risk ratios: exp(beta)

The ratio of the probability of one outcome category over the probability of the baseline category.

(P.s., relative risk ratio (RRR): \(r_1 = \frac{P(y = 1)}{P(y = basecategory)}\), \(r_2 = \frac{P(y = 2)}{P(y = basecategory)}\) …)

  • add rrr option for relative risk ratios
mlogit prtcleit agea i.gndr_bi i.edu_bi, rrr
Iteration 0:  Log likelihood = -411.73463  
Iteration 1:  Log likelihood = -400.95048  
Iteration 2:  Log likelihood = -400.57344  
Iteration 3:  Log likelihood = -400.57044  
Iteration 4:  Log likelihood = -400.57044  

Multinomial logistic regression                         Number of obs =    267
                                                        LR chi2(12)   =  22.33
                                                        Prob > chi2   = 0.0340
Log likelihood = -400.57044                             Pseudo R2     = 0.0271

--------------------------------------------------------------------------------------------------
                        prtcleit |        RRR   Std. err.      z    P>|z|     [95% conf. interval]
---------------------------------+----------------------------------------------------------------
Movimento_5_Stelle               |
                            agea |   .9834664   .0165978    -0.99   0.323     .9514675    1.016541
                       1.gndr_bi |    .696126   .2368385    -1.06   0.287     .3573461    1.356084
                        1.edu_bi |   .4097704   .1603679    -2.28   0.023     .1902892    .8824031
                           _cons |   2.328237   1.892406     1.04   0.298     .4733313     11.4522
---------------------------------+----------------------------------------------------------------
Partido_Democratico              |  (base outcome)
---------------------------------+----------------------------------------------------------------
Lega                             |
                            agea |   1.015856    .021289     0.75   0.453     .9749754     1.05845
                       1.gndr_bi |   1.030878   .4168444     0.08   0.940     .4666769    2.277183
                        1.edu_bi |   .1501012   .0971475    -2.93   0.003     .0422158     .533695
                           _cons |   .3022767   .3150693    -1.15   0.251     .0391904    2.331469
---------------------------------+----------------------------------------------------------------
Forza_Italia                     |
                            agea |     .98828   .0208948    -0.56   0.577     .9481638    1.030094
                       1.gndr_bi |   .8749206   .3729565    -0.31   0.754     .3794224    2.017504
                        1.edu_bi |   .7745469   .3528368    -0.56   0.575     .3171696    1.891489
                           _cons |   .6893515     .70458    -0.36   0.716     .0929898    5.110297
---------------------------------+----------------------------------------------------------------
Fratelli_d_Italia_con_Giorgia_Me |
                            agea |   1.022663   .0187578     1.22   0.222     .9865511    1.060096
                       1.gndr_bi |    1.06539   .3790354     0.18   0.859     .5304877    2.139646
                        1.edu_bi |   .7047801   .2721089    -0.91   0.365     .3306837    1.502085
                           _cons |    .234976   .2162484    -1.57   0.116     .0386957     1.42687
--------------------------------------------------------------------------------------------------
Note: _cons estimates baseline relative risk for each outcome.

The relative risk ratio for a one-unit increase in the variable agea is .9834664 (i.e., exp(-.0166718)) from the output of the original mlogit command before) for being in category “Movimento 5 Stelle” vs. “Partido Democratico”.

The relative risk ratio for a one-unit increase in the variable edu_bi is .4097704 for being in category “Movimento 5 Stelle” vs. “Partido Democratico”. In other words, the expected risk of staying in “Movimento 5 Stelle” is lower than in “Partido Democratico” for those who attained a tertiary education.

Probabilities

mlogit prtcleit agea i.gndr_bi i.edu_bi
Iteration 0:  Log likelihood = -411.73463  
Iteration 1:  Log likelihood = -400.95048  
Iteration 2:  Log likelihood = -400.57344  
Iteration 3:  Log likelihood = -400.57044  
Iteration 4:  Log likelihood = -400.57044  

Multinomial logistic regression                         Number of obs =    267
                                                        LR chi2(12)   =  22.33
                                                        Prob > chi2   = 0.0340
Log likelihood = -400.57044                             Pseudo R2     = 0.0271

--------------------------------------------------------------------------------------------------
                        prtcleit | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------------------+----------------------------------------------------------------
Movimento_5_Stelle               |
                            agea |  -.0166718   .0168768    -0.99   0.323    -.0497497    .0164061
                       1.gndr_bi |  -.3622246   .3402236    -1.06   0.287    -1.029051    .3046013
                        1.edu_bi |  -.8921583   .3913603    -2.28   0.023     -1.65921   -.1251063
                           _cons |   .8451113   .8128062     1.04   0.298    -.7479597    2.438182
---------------------------------+----------------------------------------------------------------
Partido_Democratico              |  (base outcome)
---------------------------------+----------------------------------------------------------------
Lega                             |
                            agea |   .0157313   .0209567     0.75   0.453    -.0253431    .0568056
                       1.gndr_bi |   .0304104   .4043588     0.08   0.940    -.7621182     .822939
                        1.edu_bi |  -1.896445   .6472133    -2.93   0.003     -3.16496   -.6279307
                           _cons |  -1.196412   1.042321    -1.15   0.251    -3.239324    .8464987
---------------------------------+----------------------------------------------------------------
Forza_Italia                     |
                            agea |  -.0117892   .0211426    -0.56   0.577     -.053228    .0296496
                       1.gndr_bi |  -.1336221   .4262747    -0.31   0.754    -.9691051     .701861
                        1.edu_bi |  -.2554771   .4555397    -0.56   0.575    -1.148319    .6373643
                           _cons |   -.372004   1.022091    -0.36   0.716    -2.375266    1.631258
---------------------------------+----------------------------------------------------------------
Fratelli_d_Italia_con_Giorgia_Me |
                            agea |   .0224098   .0183421     1.22   0.222    -.0135402    .0583597
                       1.gndr_bi |   .0633409   .3557715     0.18   0.859    -.6339585    .7606403
                        1.edu_bi |  -.3498695   .3860904    -0.91   0.365    -1.106593    .4068539
                           _cons |  -1.448272   .9203001    -1.57   0.116    -3.252027    .3554831
--------------------------------------------------------------------------------------------------

With margins

  • predicted probabilities
margins gndr_bi
Predictive margins                                         Number of obs = 267
Model VCE: OIM

1._predict: Pr(prtcleit==Movimento_5_Stelle), predict(pr outcome(1))
2._predict: Pr(prtcleit==Partido_Democratico), predict(pr outcome(2))
3._predict: Pr(prtcleit==Lega), predict(pr outcome(3))
4._predict: Pr(prtcleit==Forza_Italia), predict(pr outcome(4))
5._predict: Pr(prtcleit==Fratelli_d_Italia_con_Giorgia_Me), predict(pr outcome(5))

----------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
_predict#gndr_bi |
            1 0  |   .2641582   .0380342     6.95   0.000     .1896126    .3387038
            1 1  |   .2006323   .0343254     5.85   0.000     .1333557    .2679089
            2 0  |   .3087973    .039933     7.73   0.000       .23053    .3870647
            2 1  |   .3346888   .0399921     8.37   0.000     .2563058    .4130719
            3 0  |   .1308903   .0283682     4.61   0.000     .0752897    .1864909
            3 1  |   .1466289   .0304812     4.81   0.000     .0868869    .2063709
            4 0  |   .1152284   .0279522     4.12   0.000     .0604431    .1700137
            4 1  |   .1096237    .026717     4.10   0.000     .0572594    .1619881
            5 0  |   .1809258   .0333325     5.43   0.000     .1155954    .2462562
            5 1  |   .2084262   .0349256     5.97   0.000     .1399733    .2768791
----------------------------------------------------------------------------------
  • average marginal effects (AMEs)
margins, dydx(gndr_bi)
Average marginal effects                                   Number of obs = 267
Model VCE: OIM

dy/dx wrt: 1.gndr_bi

1._predict: Pr(prtcleit==Movimento_5_Stelle), predict(pr outcome(1))
2._predict: Pr(prtcleit==Partido_Democratico), predict(pr outcome(2))
3._predict: Pr(prtcleit==Lega), predict(pr outcome(3))
4._predict: Pr(prtcleit==Forza_Italia), predict(pr outcome(4))
5._predict: Pr(prtcleit==Fratelli_d_Italia_con_Giorgia_Me), predict(pr outcome(5))

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.gndr_bi    |  (base outcome)
-------------+----------------------------------------------------------------
1.gndr_bi    |
    _predict |
          1  |  -.0635259    .051312    -1.24   0.216    -.1640956    .0370439
          2  |   .0258915   .0566176     0.46   0.647    -.0850769    .1368599
          3  |   .0157386   .0416924     0.38   0.706    -.0659769    .0974542
          4  |  -.0056046   .0387318    -0.14   0.885    -.0815175    .0703082
          5  |   .0275004   .0483734     0.57   0.570    -.0673097    .1223104
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
  • specify the outcome category
margins, dydx(gndr_bi) predict(outcome(2))  
Average marginal effects                                   Number of obs = 267
Model VCE: OIM

Expression: Pr(prtcleit==Partido_Democratico), predict(outcome(2))
dy/dx wrt:  1.gndr_bi

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   1.gndr_bi |   .0258915   .0566176     0.46   0.647    -.0850769    .1368599
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
  • plot predicted probabilities
margins edu_bi
marginsplot, by(edu_bi)

  • plot predicted probabilities
margins edu_bi, atmeans predict(outcome(1))
marginsplot, name(Movimento_5_Stelle) 
margins edu_bi, atmeans predict(outcome(2))
marginsplot, name(Partido_Democratico) 
margins edu_bi, atmeans predict(outcome(3))
marginsplot, name(Lega) 
margins edu_bi, atmeans predict(outcome(4))
marginsplot, name(Forza_Italia) 
margins edu_bi, atmeans predict(outcome(5))
marginsplot, name(Fratelli_d_Italia) 
graph combine Movimento_5_Stelle Partido_Democratico Lega Forza_Italia Fratelli_d_Italia, ycommon

  • plot AMEs (probability differences)
margins, dydx(edu_bi)
marginsplot, recast(scatter)

More on marginsplot see the documentation here

3. Ordinal logistic regression

For ordered categorical outcomes.

The proportional odds assumption: in ordered logistic regression, the coefficients representing the relationship between the lowest category and all higher categories of the outcome variable are the same as those representing the relationship between the next lowest category and all higher categories, and so on.

Let’s say we are interested in predicting political efficacy cptppola (confidence in own ability to participate in politics) in Italy using a set of covariates.

  • check variable to be used
tab cptppola
tab cptppola, nolabel
    Confident in own |
          ability to |
      participate in |
            politics |      Freq.     Percent        Cum.
---------------------+-----------------------------------
Not at all confident |         29       10.78       10.78
  A little confident |        107       39.78       50.56
     Quite confident |        102       37.92       88.48
      Very confident |         27       10.04       98.51
Completely confident |          4        1.49      100.00
---------------------+-----------------------------------
               Total |        269      100.00


  Confident |
     in own |
 ability to |
participate |
in politics |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         29       10.78       10.78
          2 |        107       39.78       50.56
          3 |        102       37.92       88.48
          4 |         27       10.04       98.51
          5 |          4        1.49      100.00
------------+-----------------------------------
      Total |        269      100.00
  • recode variable,soL categories 1/2 are combined into 0, category 3 = 1, and categories 4/5 into 2
recode cptppola (1/2=0) (3=1) (4/5=2), gen(cptppola3)
tab cptppola3, nolabel
(269 differences between cptppola and cptppola3)


  RECODE of |
   cptppola |
 (Confident |
     in own |
 ability to |
participate |
         in |
  politics) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        136       50.56       50.56
          1 |        102       37.92       88.48
          2 |         31       11.52      100.00
------------+-----------------------------------
      Total |        269      100.00

Log odds

  • fit ologit
ologit cptppola3 agea i.gndr_bi i.edu_bi
Iteration 0:  Log likelihood = -254.42882  
Iteration 1:  Log likelihood = -239.22197  
Iteration 2:  Log likelihood = -239.08618  
Iteration 3:  Log likelihood = -239.08593  
Iteration 4:  Log likelihood = -239.08593  

Ordered logistic regression                             Number of obs =    265
                                                        LR chi2(3)    =  30.69
                                                        Prob > chi2   = 0.0000
Log likelihood = -239.08593                             Pseudo R2     = 0.0603

------------------------------------------------------------------------------
   cptppola3 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |  -.0065224   .0121352    -0.54   0.591    -.0303069    .0172621
   1.gndr_bi |  -.4741244   .2442656    -1.94   0.052    -.9528763    .0046274
    1.edu_bi |   1.402793   .2786091     5.03   0.000     .8567294    1.948857
-------------+----------------------------------------------------------------
       /cut1 |   -.153444   .5886419                     -1.307161    1.000273
       /cut2 |   2.093408   .6103984                      .8970495    3.289767
------------------------------------------------------------------------------

A one unit increase in age is related to an expected 0.007 decrease in the log odds of political efficacy cptppola3, given all of the other variables held constant. However, it is not statistically significant at the 0.05 level.

Odds ratios: exp(beta)

  • use or option
ologit cptppola3 agea i.gndr_bi i.edu_bi, or
Iteration 0:  Log likelihood = -254.42882  
Iteration 1:  Log likelihood = -239.22197  
Iteration 2:  Log likelihood = -239.08618  
Iteration 3:  Log likelihood = -239.08593  
Iteration 4:  Log likelihood = -239.08593  

Ordered logistic regression                             Number of obs =    265
                                                        LR chi2(3)    =  30.69
                                                        Prob > chi2   = 0.0000
Log likelihood = -239.08593                             Pseudo R2     = 0.0603

------------------------------------------------------------------------------
   cptppola3 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |   .9934988   .0120563    -0.54   0.591     .9701477    1.017412
   1.gndr_bi |   .6224298   .1520382    -1.94   0.052     .3856303    1.004638
    1.edu_bi |   4.066543   1.132976     5.03   0.000     2.355444    7.020659
-------------+----------------------------------------------------------------
       /cut1 |   -.153444   .5886419                     -1.307161    1.000273
       /cut2 |   2.093408   .6103984                      .8970495    3.289767
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.

Probabilities

ologit cptppola3 agea i.gndr_bi i.edu_bi
Iteration 0:  Log likelihood = -254.42882  
Iteration 1:  Log likelihood = -239.22197  
Iteration 2:  Log likelihood = -239.08618  
Iteration 3:  Log likelihood = -239.08593  
Iteration 4:  Log likelihood = -239.08593  

Ordered logistic regression                             Number of obs =    265
                                                        LR chi2(3)    =  30.69
                                                        Prob > chi2   = 0.0000
Log likelihood = -239.08593                             Pseudo R2     = 0.0603

------------------------------------------------------------------------------
   cptppola3 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        agea |  -.0065224   .0121352    -0.54   0.591    -.0303069    .0172621
   1.gndr_bi |  -.4741244   .2442656    -1.94   0.052    -.9528763    .0046274
    1.edu_bi |   1.402793   .2786091     5.03   0.000     .8567294    1.948857
-------------+----------------------------------------------------------------
       /cut1 |   -.153444   .5886419                     -1.307161    1.000273
       /cut2 |   2.093408   .6103984                      .8970495    3.289767
------------------------------------------------------------------------------

With margins

  • predicted probabilities
margins gndr_bi
Predictive margins                                         Number of obs = 265
Model VCE: OIM

1._predict: Pr(cptppola3==0), predict(pr outcome(0))
2._predict: Pr(cptppola3==1), predict(pr outcome(1))
3._predict: Pr(cptppola3==2), predict(pr outcome(2))

----------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
_predict#gndr_bi |
            1 0  |   .4490069   .0399203    11.25   0.000     .3707645    .5272493
            1 1  |   .5564695   .0398033    13.98   0.000     .4784564    .6344826
            2 0  |   .4143869    .034125    12.14   0.000     .3475031    .4812706
            2 1  |   .3517536   .0330428    10.65   0.000     .2869908    .4165163
            3 0  |   .1366062   .0253045     5.40   0.000     .0870103    .1862021
            3 1  |   .0917769   .0189291     4.85   0.000     .0546765    .1288774
----------------------------------------------------------------------------------
  • average marginal effects (AMEs)
margins, dydx(agea)
margins, dydx(gndr_bi)
Average marginal effects                                   Number of obs = 265
Model VCE: OIM

dy/dx wrt: agea

1._predict: Pr(cptppola3==0), predict(pr outcome(0))
2._predict: Pr(cptppola3==1), predict(pr outcome(1))
3._predict: Pr(cptppola3==2), predict(pr outcome(2))

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
agea         |
    _predict |
          1  |   .0014758   .0027411     0.54   0.590    -.0038966    .0068482
          2  |  -.0008604   .0015988    -0.54   0.590     -.003994    .0022732
          3  |  -.0006154   .0011477    -0.54   0.592    -.0028648     .001634
------------------------------------------------------------------------------


Average marginal effects                                   Number of obs = 265
Model VCE: OIM

dy/dx wrt: 1.gndr_bi

1._predict: Pr(cptppola3==0), predict(pr outcome(0))
2._predict: Pr(cptppola3==1), predict(pr outcome(1))
3._predict: Pr(cptppola3==2), predict(pr outcome(2))

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.gndr_bi    |  (base outcome)
-------------+----------------------------------------------------------------
1.gndr_bi    |
    _predict |
          1  |   .1074626   .0547045     1.96   0.049     .0002437    .2146815
          2  |  -.0626333    .032445    -1.93   0.054    -.1262243    .0009577
          3  |  -.0448293   .0236709    -1.89   0.058    -.0912234    .0015648
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
  • plot predicted probabilities
margins edu_bi
marginsplot, by(edu_bi)

  • plot AMEs (probability differences)
margins, dydx(edu_bi)
marginsplot, recast(scatter)

Graph Editor in Stata!

“With Stata’s Graph Editor, you can change how your graph looks. You can add. You can remove. You can move. You can modify. Anything…” See the documentation here

It helps us tidy up our graphs and make them pop! So please use it for your projects.

We’ve noticed that you tend to hold off on in-class assignments when they aren’t required.

Image source here

So… We have to “force” you to take a step forward and finish up the work you haven’t completed.

Last in-class assignment (mandatory!)

Due by 23:59 tomorrow, 20/11/2024. The outcome is binary: you either uploaded or did not, and no room for further negotiation.


  1. Complete and review your assignment from last week, and upload your do-file along with your interpretations of the model outputs (just inline comments within your do-file). I expect to see your comments on the interaction terms.

You should have obtained similar graphs by the end:

  1. Upload your DAGs following the instructions provided in last week’s DAGs slides: