1 Logistic Regression

  • Logistic regression is often used to analyse the effect of a continuous or categorical explanatory( or independent) variable on a binary response variable.
  • For grouped count data in Logistic regression, it assumes a binomial probability distribution function (PDF)
  • For binary response data in Logistic regression, it assumes a canonical form of the Bernoulli distribution function PDF -Book Reference: Crawley (2013)

2 Objectives:

  • to analyse data with logistic regression with the problem of overdispersion

3 Binary Logistic Regression

3.1 logitbin Dataset:

  • independent value X
  • the binary outcome 0 or 1 in the dependent variable Y
   Object  X Y
1       1  1 0
2       2  1 0
3       3  3 1
4       4  4 0
5       5  5 1
6       6  1 0
7       7  2 0
8       8  2 0
9       9  2 0
10     10  3 0
11     11  4 0
12     12  5 0
13     13  6 0
14     14  6 0
15     15  6 1
16     16  7 0
17     17  7 0
18     18  7 1
19     19  7 1
20     20  7 0
21     21  8 1
22     22  8 1
23     23  8 1
24     24  8 1
25     25  8 1
26     26  9 1
27     27  9 1
28     28  9 1
29     29 10 1
30     30 11 1
31     31 11 1
32     32 11 1
33     33 12 1
34     34 13 1
35     35 13 1
36     36  3 0
37     37  3 0
'data.frame':   37 obs. of  3 variables:
 $ Object: num  1 2 3 4 5 6 7 8 9 10 ...
 $ X     : num  1 1 3 4 5 1 2 2 2 3 ...
 $ Y     : num  0 0 1 0 1 0 0 0 0 0 ...

4 Proportional logistic regression(PLR)

4.1 logitprop Dataset:

  • two dependent variables Y0 and Y1 containing the frequency of the outcome [0,1] for each level of X.
    X Y0 Y1
1   1  3  0
2   2  3  0
3   3  3  1
4   4  2  0
5   5  1  1
6   6  2  1
7   7  3  2
8   8  0  5
9   9  0  3
10 10  0  1
11 11  0  3
12 12  0  1
13 13  0  2
'data.frame':   13 obs. of  3 variables:
 $ X : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Y0: num  3 3 3 2 1 2 3 0 0 0 ...
 $ Y1: num  0 0 1 0 1 1 2 5 3 1 ...

4.2 PLR: logitprop_model_02


Call:
glm(formula = cbind(Y1, Y0) ~ X, family = binomial(link = logit), 
    data = logitprop)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3073  -0.4377   0.1172   0.6303   1.3465  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -5.1093     1.7755  -2.878  0.00401 **
X             0.8406     0.2686   3.130  0.00175 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 33.2289  on 12  degrees of freedom
Residual deviance:  7.0578  on 11  degrees of freedom
AIC: 17.917

Number of Fisher Scoring iterations: 6
(Intercept)           X 
0.006040245 2.317778111 
(Intercept)           X 
0.006040245 2.317778111 

5 Conclusion:

  • Logistic Regression of “logitbin_model_01” and proportional logistic regression of “logitprop_model_02” estimate to have the same parameters.
  • Both two models shows having poor fit at intermediate values of X.
  • There is no indication of overdispersion for both two models.
  • Hence conclude that we have a significant positive effect of variable X on the odds of the Y1.
  • The odds for Y1 increases 2.317778111 timefolds for each unit increase in X.

Reference

Crawley, Michael J. 2013. “The R Book Second Edition.” John Wiley & Sons.